What will your LLM API bill be next month?
Enter your expected call volume and prompt sizes. LLMCostCalc shows your monthly cost for every major Claude, GPT-5, and Gemini model, side by side, with context caching savings and a best-value recommendation.
How LLMCostCalc works
- 1
Enter your expected call volume
Type your API calls per day, or pick a preset from 100/day to 100,000/day. This is the single biggest driver of your monthly cost, since every other number is multiplied by it.
- 2
Set your average prompt and response sizes
Enter average input tokens (your prompt, system message, and any context) and average output tokens (the model's response). Presets cover short prompts (~500 tokens) through extended context (~32,000 tokens).
- 3
Adjust working days per month
Defaults to 30. If your workload only runs on business days, set this to 20-22 to get a more accurate monthly figure.
- 4
Pick which models to compare
All 8 flagship, balanced, and fast-tier models across Anthropic, OpenAI, and Google are selected by default. Deselect any you don't need, or use 'Select all' / 'Deselect all' to reset.
- 5
Toggle context caching if applicable
If most of your input tokens are repeated across calls (a system prompt, tool schemas, RAG documents), enable caching savings. The calculator assumes 80% of input tokens are billed at the cached-read rate for Anthropic and Google models.
- 6
Read the comparison table and recommendation
Each model shows its monthly input, output, and total cost, plus cost per call. Rows are color-coded by cost tier, the cheapest option is flagged, and a plain-English recommendation highlights the best cost-to-quality tradeoff.
What each part of the output means
The comparison table and recommendation box translate raw per-token pricing into numbers you can act on.
Each model's total monthly cost is color-coded so you can scan for budget fit at a glance: green is under $50/month, yellow is $50-500/month, orange is $500-2,000/month, and red is over $2,000/month. The cheapest selected model is flagged with a 'Cheapest' badge.
1,000 calls/day, 2,000 input + 800 output tokens, 30 days
Gemini 2.5 Flash $11.70/mo (green — Cheapest)
Claude Haiku 4.5 $144.00/mo (yellow)
GPT-5 Pro $1,320.00/mo (orange)
Claude Opus 4.5 $2,700.00/mo (red)Input cost and output cost are shown separately because output tokens are typically priced 3-5x higher than input tokens. If your output cost dominates, shortening responses (lower max_tokens, more concise prompts) saves more than optimizing the prompt. Cost per call is useful for attributing cost to a specific feature: multiply it by your feature's expected call volume.
Claude Sonnet 4.6 at 1,000 calls/day, 30 days:
Input: $180.00/mo (60,000,000 tokens × $3/MTok)
Output: $360.00/mo (24,000,000 tokens × $15/MTok)
Total: $540.00/mo
Cost per call: $0.018A plain-English summary picking the highest-tier model (flagship/balanced/fast) that still falls in the 'moderate cost or below' band. If the absolute cheapest model is a different one (usually a fast-tier model), it's named separately so you can decide whether the quality tradeoff is worth the saving.
"For this volume, Gemini 2.5 Pro (balanced tier) gives the
best performance-to-cost ratio at $195.00/month. If cost is the
only factor, Gemini 2.5 Flash is cheapest at $11.70/month."Current LLM API pricing (June 2026)
Per-million-token rates used by this calculator. Prices change frequently: this table is also available inside the tool, and you can verify current rates against each provider's pricing page.
| Model | Input $/MTok | Output $/MTok |
|---|---|---|
| Claude Opus 4.5 | $15.00 | $75.00 |
| Claude Sonnet 4.6 | $3.00 | $15.00 |
| Claude Haiku 4.5 | $0.80 | $4.00 |
| GPT-5 Pro | $10.00 | $30.00 |
| GPT-5 | $2.50 | $10.00 |
| GPT-5 mini | $0.15 | $0.60 |
| Gemini 2.5 Pro | $1.25 | $5.00 |
| Gemini 2.5 Flash | $0.075 | $0.30 |
1 MTok = 1,000,000 tokens, roughly 750,000 words of English text. Anthropic and Google models also support context caching: cached input tokens are billed at roughly 10% (Anthropic) or 25% (Google) of the standard input rate.
When to use LLMCostCalc
| Situation |
|---|
| Planning a new RAG pipeline |
| Deciding whether to downgrade from Sonnet to Haiku |
| Presenting an AI infrastructure estimate to a manager |
| Choosing between Claude Opus and GPT-5 Pro for an agentic workflow |
| Evaluating prompt caching for a large system prompt |
| Comparing direct API costs to GitHub Copilot's token billing |
Frequently Asked Questions
What is LLMCostCalc and how does it work?
LLMCostCalc is a free browser-based calculator that estimates your monthly LLM API bill across Anthropic Claude, OpenAI GPT-5, and Google Gemini models. Enter your expected call volume and average prompt/response sizes, and it calculates the input and output token cost for every model at current per-token pricing.
Instead of bouncing between three pricing pages and doing the multiplication yourself, you get a side-by-side cost comparison, a color-coded cost tier for each model, and a plain-English recommendation for the best cost-to-quality tradeoff.
How much does the Claude API cost per month?
It depends entirely on call volume, prompt size, and which Claude model you use. As a reference point, at 1,000 calls/day with 2,000 input tokens and 800 output tokens per call (30-day month), Claude Sonnet 4.6 costs roughly $540/month, Claude Haiku 4.5 costs roughly $144/month, and Claude Opus 4.5 costs roughly $2,700/month.
Enable the context caching toggle if most of your prompt is repeated across calls (a system prompt, RAG context, or few-shot examples): caching 80% of input tokens drops the Sonnet 4.6 example above to roughly $410/month.
Is Claude cheaper than GPT-5?
| Anthropic Claude | OpenAI GPT-5 | |
|---|---|---|
| Flagship tier | Opus 4.5: $15 / $75 per MTok | GPT-5 Pro: $10 / $30 per MTok |
| Balanced tier | Sonnet 4.6: $3 / $15 per MTok | GPT-5: $2.50 / $10 per MTok |
| Fast tier | Haiku 4.5: $0.80 / $4 per MTok | GPT-5 mini: $0.15 / $0.60 per MTok |
| Context caching | Yes, ~10% of input price for cached reads | Not modeled in this calculator |
At equivalent tiers, GPT-5 models are typically cheaper per token, especially at the fast/lightweight tier. Claude's prompt caching can close or reverse that gap for workloads with a large, repeated context (system prompts, tool definitions, RAG documents). The only way to know which is cheaper for your workload is to plug in your actual volume and token sizes, which is exactly what this calculator does.
How do I calculate my LLM API costs manually?
The underlying formula is simple arithmetic. For a given model:
monthly_calls = calls_per_day * working_days
input_tokens_per_month = monthly_calls * avg_input_tokens
output_tokens_per_month = monthly_calls * avg_output_tokens
input_cost = (input_tokens_per_month / 1_000_000) * price_per_mtok_input
output_cost = (output_tokens_per_month / 1_000_000) * price_per_mtok_output
total_cost = input_cost + output_cost
cost_per_call = total_cost / monthly_callsLLMCostCalc runs this same calculation for every selected model simultaneously, with an additional adjustment when context caching is enabled: 80% of input tokens are billed at the model's cached-read rate instead of the standard input rate.
Does Anthropic or Google offer context caching to reduce costs?
Yes. Anthropic's prompt caching and Google's context caching both let you cache large, frequently reused portions of your prompt (system instructions, tool schemas, RAG documents) so that subsequent calls are billed at a fraction of the standard input token price for the cached portion.
The 'Context caching savings' toggle in LLMCostCalc models this by billing 80% of input tokens at the cached-read rate (roughly 10% of the standard input price for Anthropic, 25% for Google) and the remaining 20% at the standard rate. This is a simplification: actual savings depend on your cache hit rate and how your provider charges for cache writes.
Does this calculator send my usage data anywhere?
No. All pricing data is a hardcoded JSON object and all calculations run in JavaScript in your browser. Nothing is sent to any server, there's no API call, and no account is required.
What is the cheapest LLM API for high volume?
At very high call volumes, the fast/lightweight tier models dominate: Gemini 2.5 Flash ($0.075 / $0.30 per MTok), GPT-5 mini ($0.15 / $0.60 per MTok), and Claude Haiku 4.5 ($0.80 / $4 per MTok) are typically the cheapest options, often 10-50x cheaper than flagship models for the same volume.
These models are well suited for classification, extraction, summarization, and other tasks that don't require frontier reasoning. A common pattern is to route routine, high-volume requests to a fast model and reserve flagship models (Opus, GPT-5 Pro) for tasks that genuinely need them. Select only the fast-tier models in the calculator to compare them directly at your volume.