Skip to content
Dashboard

DeepSeek enters the fight for token volume, Anthropic continues to dominate spend

Link to headingMay 2026 summary

Link to headingLow-cost models saw significant production volume for the first time

In May 2026, DeepSeek held 17% of monthly tokens, putting it third on the gateway by token volume.In May 2026, DeepSeek held 17% of monthly tokens, putting it third on the gateway by token volume.
In May 2026, DeepSeek held 17% of monthly tokens, putting it third on the gateway by token volume.
DeepSeek was prominent in the previous token volume chart, but is nearly invisible in this spend chart.DeepSeek was prominent in the previous token volume chart, but is nearly invisible in this spend chart.
DeepSeek was prominent in the previous token volume chart, but is nearly invisible in this spend chart.

Link to headingFrontier labs continued to capture a majority of new spend

In April 2026, xAI and MiniMax drove significant token volume in the coding agent use case.In April 2026, xAI and MiniMax drove significant token volume in the coding agent use case.
In April 2026, xAI and MiniMax drove significant token volume in the coding agent use case.
In May 2026, DeepSeek took almost half of the coding agent use case, with xAI and MiniMax dropping off significantly. Back-office workloads stayed Anthropic-heavy across both months.In May 2026, DeepSeek took almost half of the coding agent use case, with xAI and MiniMax dropping off significantly. Back-office workloads stayed Anthropic-heavy across both months.
In May 2026, DeepSeek took almost half of the coding agent use case, with xAI and MiniMax dropping off significantly. Back-office workloads stayed Anthropic-heavy across both months.
In April 2026, Anthropic was the go-to frontier lab for high-stakes use cases like AI app generation, back office agents, and AI coding agents.In April 2026, Anthropic was the go-to frontier lab for high-stakes use cases like AI app generation, back office agents, and AI coding agents.
In April 2026, Anthropic was the go-to frontier lab for high-stakes use cases like AI app generation, back office agents, and AI coding agents.
Anthropic continued to own high-stakes use cases in May 2026, even with DeepSeek V4's significant gain in token volume.Anthropic continued to own high-stakes use cases in May 2026, even with DeepSeek V4's significant gain in token volume.
Anthropic continued to own high-stakes use cases in May 2026, even with DeepSeek V4's significant gain in token volume.

Link to headingCost discipline became a routing strategy

When Gemini 3.5 Flash launched in May at a higher price than Gemini 3, migration didn’t happen at scale.When Gemini 3.5 Flash launched in May at a higher price than Gemini 3, migration didn’t happen at scale.
When Gemini 3.5 Flash launched in May at a higher price than Gemini 3, migration didn’t happen at scale.
When Gemini 3.1 Pro launched in February, it gained 30% adoption immediately, and by the next month was the dominant model in the family.When Gemini 3.1 Pro launched in February, it gained 30% adoption immediately, and by the next month was the dominant model in the family.
When Gemini 3.1 Pro launched in February, it gained 30% adoption immediately, and by the next month was the dominant model in the family.

Link to headingConclusion: Cost-effective, capable options mean smarter model mixes

Link to headingAppendix

Link to headingToken vs cost share by B2B classification

B2C drives token volume while B2B drives spend.B2C drives token volume while B2B drives spend.
B2C drives token volume while B2B drives spend.

Link to headingAgent tool use across tokens and requests

Agentic traffic remains far more token-heavy than its request share suggests, running about 2.5x denser per request on average.Agentic traffic remains far more token-heavy than its request share suggests, running about 2.5x denser per request on average.
Agentic traffic remains far more token-heavy than its request share suggests, running about 2.5x denser per request on average.

Link to headingModel diversity distribution by request volume

Model diversity rises with scale. At 1M+ requests, teams route across 11 distinct models or more.Model diversity rises with scale. At 1M+ requests, teams route across 11 distinct models or more.
Model diversity rises with scale. At 1M+ requests, teams route across 11 distinct models or more.

Link to headingCost vs volume share by use case

Volume-heavy workloads run cheaper per token. High-stakes workloads drive far less volume, but are expensive.Volume-heavy workloads run cheaper per token. High-stakes workloads drive far less volume, but are expensive.
Volume-heavy workloads run cheaper per token. High-stakes workloads drive far less volume, but are expensive.

Link to headingPrevious reports

Link to headingAbout this data