AI Model Token Pricing in 2026: A Plain-English Comparison
If you build anything with AI, your bill comes down to one unit: the token. This guide explains how token pricing works, then compares current per-token rates across the major providers — OpenAI, Anthropic, Google, Mistral, and a few strong challengers — so you can pick the right model for your budget.
What is a token?
A token is a small chunk of text — usually part of a word — that a model reads and writes. It's not a whole word and not a single letter; it sits in between. As a rough rule of thumb:
- 1 token ≈ 4 characters of English text.
- 1,000 tokens ≈ 750 words (about 1.5 pages).
- A short chat reply is often 300–800 tokens; a long document can be tens of thousands.
Almost every provider prices per 1 million tokens (often written "/M" or "MTok"), billed separately for what you send and what you get back.
Input vs output tokens — why output costs more
There are two meters running on every request:
- Input tokens — everything you send: your prompt, system message, chat history, and any documents or context you attach.
- Output tokens — everything the model generates back, including the hidden "thinking" tokens on reasoning models.
Output is almost always more expensive than input — typically 2× to 8×. For example, a flagship model might charge $2.50 per million input tokens but $15.00 per million output. The practical lesson: capping how much the model writes usually saves more money than trimming your prompt.
Pricing by provider (per 1M tokens)
OpenAI
The broadest line-up, from ultra-cheap nano models to premium reasoning. The current flagship is the GPT-5.5 family.
| Model | Input /M | Output /M | Best for |
|---|
Anthropic (Claude)
A clean three-tier line-up with a consistent 5× output-to-input ratio. The flagship is Claude Opus 4.8.
| Model | Input /M | Output /M | Best for |
|---|
Opus, Sonnet and Haiku all support a 1M-token context at standard rates, with 50% batch and ~90% caching discounts.
Google (Gemini)
Strong value at the low end and huge context windows. The newest workhorse is Gemini 3.5 Flash.
| Model | Input /M | Output /M | Best for |
|---|
Mistral
A Paris-based provider known for cheap output pricing, open-weight models you can self-host, and EU/GDPR data residency. The lineup is renamed often, so always confirm on Mistral's own page.
| Model | Input /M | Output /M | Best for |
|---|
Other strong options
- DeepSeek — the budget champion. Recent models land around $0.14 input / $0.28 output per million, with a ~90% cache discount. It delivers roughly 90% of flagship quality for everyday chat, summarizing and extraction at a fraction of the cost.
- xAI Grok — the current flagship Grok 4.3 is about $1.25 / $2.50 per million with a very large (1–2M token) context and live X/web search. (Note: older bargain SKUs like Grok 4.1 Fast were retired in May 2026.)
- Meta Llama 4 — open-weight and free to download. Via hosts like Together, Fireworks or Groq it typically runs $0.05–$0.90 per million; self-hosting at scale can push effective costs below $0.10.
How much does a real request cost?
Take a typical request of 1,000 input tokens + 500 output tokens. The math is simple:
| Model | Cost / request | Cost / 100,000 requests |
|---|---|---|
| GPT-5.5 | $0.0200 | $2,000 |
| Claude Opus 4.8 | $0.0175 | $1,750 |
| Claude Sonnet 4.6 | $0.0105 | $1,050 |
| GPT-5.4 | $0.0100 | $1,000 |
| Gemini 3.5 Flash | $0.0060 | $600 |
| Claude Haiku 4.5 | $0.0035 | $350 |
| Grok 4.3 | $0.0025 | $250 |
| Gemini Flash-Lite / GPT-4.1 nano | $0.0003 | $30 |
| DeepSeek | $0.0003 | $28 |
The same simple request can cost 70× more on a flagship than on a budget model — which is exactly why choosing the right model per task is the biggest lever on your bill.
Which model is most cost-effective?
- Chatbots & support — most replies don't need a flagship. Claude Haiku 4.5, GPT-5.4 Mini, or Gemini Flash handle FAQ-style help at a fraction of the cost.
- Coding & agents — worth paying up: Claude Opus 4.8, GPT-5.5, or GPT-5.2-Codex lead here. Gemini 3.5 Flash is a strong mid-budget choice.
- Long context (whole codebases, long docs) — Gemini, Claude and Grok offer 1M+ token windows; combine with prompt caching to keep big contexts affordable.
- High-volume, simple tasks (tagging, routing, extraction) — go cheapest: Gemini 2.5 Flash-Lite, GPT-4.1 nano, Mistral Small, or DeepSeek.
- Tight budget, good-enough quality — DeepSeek and open-weight Llama 4 give the best value per dollar.
A note on accuracy
AI prices have fallen fast — by some estimates ~80% between early 2025 and 2026 — and providers update rates and launch new models constantly. The figures above are accurate to the best of our knowledge as of June 2026, in USD per million tokens at standard rates (before caching or batch discounts). Always confirm the final price on the provider's own pricing page, or use the live calculator, before relying on a number for budgeting.