TokenSwarm
LLM Pricing & Cost Calculator
↗ Live calculator
Guide

AI Model Token Pricing in 2026: A Plain-English Comparison

Tables update live from OpenRouter · loading current prices…

If you build anything with AI, your bill comes down to one unit: the token. This guide explains how token pricing works, then compares current per-token rates across the major providers — OpenAI, Anthropic, Google, Mistral, and a few strong challengers — so you can pick the right model for your budget.

What is a token?

A token is a small chunk of text — usually part of a word — that a model reads and writes. It's not a whole word and not a single letter; it sits in between. As a rough rule of thumb:

Almost every provider prices per 1 million tokens (often written "/M" or "MTok"), billed separately for what you send and what you get back.

Input vs output tokens — why output costs more

There are two meters running on every request:

Output is almost always more expensive than input — typically 2× to 8×. For example, a flagship model might charge $2.50 per million input tokens but $15.00 per million output. The practical lesson: capping how much the model writes usually saves more money than trimming your prompt.

💡 Two big discounts to know: most providers offer prompt caching (re-using a fixed prompt or document drops its input cost by ~90%) and a batch API (~50% off for non-urgent jobs that can wait minutes to hours). These can dramatically cut a real bill.

Pricing by provider (per 1M tokens)

OpenAI

The broadest line-up, from ultra-cheap nano models to premium reasoning. The current flagship is the GPT-5.5 family.

ModelInput /MOutput /MBest for

Anthropic (Claude)

A clean three-tier line-up with a consistent 5× output-to-input ratio. The flagship is Claude Opus 4.8.

ModelInput /MOutput /MBest for

Opus, Sonnet and Haiku all support a 1M-token context at standard rates, with 50% batch and ~90% caching discounts.

Advertisement

Google (Gemini)

Strong value at the low end and huge context windows. The newest workhorse is Gemini 3.5 Flash.

ModelInput /MOutput /MBest for

Mistral

A Paris-based provider known for cheap output pricing, open-weight models you can self-host, and EU/GDPR data residency. The lineup is renamed often, so always confirm on Mistral's own page.

ModelInput /MOutput /MBest for

Other strong options

How much does a real request cost?

Take a typical request of 1,000 input tokens + 500 output tokens. The math is simple:

cost = (input_tokens ÷ 1,000,000 × input_price) + (output_tokens ÷ 1,000,000 × output_price)
ModelCost / requestCost / 100,000 requests
GPT-5.5$0.0200$2,000
Claude Opus 4.8$0.0175$1,750
Claude Sonnet 4.6$0.0105$1,050
GPT-5.4$0.0100$1,000
Gemini 3.5 Flash$0.0060$600
Claude Haiku 4.5$0.0035$350
Grok 4.3$0.0025$250
Gemini Flash-Lite / GPT-4.1 nano$0.0003$30
DeepSeek$0.0003$28

The same simple request can cost 70× more on a flagship than on a budget model — which is exactly why choosing the right model per task is the biggest lever on your bill.

Advertisement

Which model is most cost-effective?

Want the exact, live number for your usage?
TokenSwarm's calculator pulls real-time prices for 300+ models and computes your cost instantly.
Open the live cost calculator →

A note on accuracy

AI prices have fallen fast — by some estimates ~80% between early 2025 and 2026 — and providers update rates and launch new models constantly. The figures above are accurate to the best of our knowledge as of June 2026, in USD per million tokens at standard rates (before caching or batch discounts). Always confirm the final price on the provider's own pricing page, or use the live calculator, before relying on a number for budgeting.

Want to spend less?
Read: How to cut your LLM API costs →