Guide

AI Model Token Pricing in 2026: A Plain-English Comparison

Tables update live from OpenRouter · loading current prices…

If you build anything with AI, your bill comes down to one unit: the token. This guide explains how token pricing works, then compares current per-token rates across the major providers — OpenAI, Anthropic, Google, Mistral, and a few strong challengers — so you can pick the right model for your budget.

What is a token?

A token is a small chunk of text — usually part of a word — that a model reads and writes. It's not a whole word and not a single letter; it sits in between. As a rough rule of thumb:

1 token ≈ 4 characters of English text.
1,000 tokens ≈ 750 words (about 1.5 pages).
A short chat reply is often 300–800 tokens; a long document can be tens of thousands.

Almost every provider prices per 1 million tokens (often written "/M" or "MTok"), billed separately for what you send and what you get back.

Input vs output tokens — why output costs more

There are two meters running on every request:

Input tokens — everything you send: your prompt, system message, chat history, and any documents or context you attach.
Output tokens — everything the model generates back, including the hidden "thinking" tokens on reasoning models.

Output is almost always more expensive than input — typically 2× to 8×. For example, a flagship model might charge $2.50 per million input tokens but $15.00 per million output. The practical lesson: capping how much the model writes usually saves more money than trimming your prompt.

💡 Two big discounts to know: most providers offer prompt caching (re-using a fixed prompt or document drops its input cost by ~90%) and a batch API (~50% off for non-urgent jobs that can wait minutes to hours). These can dramatically cut a real bill.

Pricing by provider (per 1M tokens)

OpenAI

The broadest line-up, from ultra-cheap nano models to premium reasoning. The current flagship is the GPT-5.5 family.

Model	Input /M	Output /M	Best for

Anthropic (Claude)

A clean three-tier line-up with a consistent 5× output-to-input ratio. The flagship is Claude Opus 4.8.

Model	Input /M	Output /M	Best for

Opus, Sonnet and Haiku all support a 1M-token context at standard rates, with 50% batch and ~90% caching discounts.

Google (Gemini)

Strong value at the low end and huge context windows. The newest workhorse is Gemini 3.5 Flash.

Model	Input /M	Output /M	Best for

Mistral

A Paris-based provider known for cheap output pricing, open-weight models you can self-host, and EU/GDPR data residency. The lineup is renamed often, so always confirm on Mistral's own page.

Model	Input /M	Output /M	Best for

Other strong options

DeepSeek — the budget champion. Recent models land around $0.14 input / $0.28 output per million, with a ~90% cache discount. It delivers roughly 90% of flagship quality for everyday chat, summarizing and extraction at a fraction of the cost.
xAI Grok — the current flagship Grok 4.3 is about $1.25 / $2.50 per million with a very large (1–2M token) context and live X/web search. (Note: older bargain SKUs like Grok 4.1 Fast were retired in May 2026.)
Meta Llama 4 — open-weight and free to download. Via hosts like Together, Fireworks or Groq it typically runs $0.05–$0.90 per million; self-hosting at scale can push effective costs below $0.10.

How much does a real request cost?

Take a typical request of 1,000 input tokens + 500 output tokens. The math is simple:

cost = (input_tokens ÷ 1,000,000 × input_price) + (output_tokens ÷ 1,000,000 × output_price)

Model	Cost / request	Cost / 100,000 requests
GPT-5.5	$0.0200	$2,000
Claude Opus 4.8	$0.0175	$1,750
Claude Sonnet 4.6	$0.0105	$1,050
GPT-5.4	$0.0100	$1,000
Gemini 3.5 Flash	$0.0060	$600
Claude Haiku 4.5	$0.0035	$350
Grok 4.3	$0.0025	$250
Gemini Flash-Lite / GPT-4.1 nano	$0.0003	$30
DeepSeek	$0.0003	$28

The same simple request can cost 70× more on a flagship than on a budget model — which is exactly why choosing the right model per task is the biggest lever on your bill.

Which model is most cost-effective?

Chatbots & support — most replies don't need a flagship. Claude Haiku 4.5, GPT-5.4 Mini, or Gemini Flash handle FAQ-style help at a fraction of the cost.
Coding & agents — worth paying up: Claude Opus 4.8, GPT-5.5, or GPT-5.2-Codex lead here. Gemini 3.5 Flash is a strong mid-budget choice.
Long context (whole codebases, long docs) — Gemini, Claude and Grok offer 1M+ token windows; combine with prompt caching to keep big contexts affordable.
High-volume, simple tasks (tagging, routing, extraction) — go cheapest: Gemini 2.5 Flash-Lite, GPT-4.1 nano, Mistral Small, or DeepSeek.
Tight budget, good-enough quality — DeepSeek and open-weight Llama 4 give the best value per dollar.

Want the exact, live number for your usage?

TokenSwarm's calculator pulls real-time prices for 300+ models and computes your cost instantly.

Open the live cost calculator →

A note on accuracy

AI prices have fallen fast — by some estimates ~80% between early 2025 and 2026 — and providers update rates and launch new models constantly. The figures above are accurate to the best of our knowledge as of June 2026, in USD per million tokens at standard rates (before caching or batch discounts). Always confirm the final price on the provider's own pricing page, or use the live calculator, before relying on a number for budgeting.

Want to spend less?

Read: How to cut your LLM API costs →