Free · No login · Runs in your browser

What will your LLM API bill be next month?

Enter your expected call volume and prompt sizes. LLMCostCalc shows your monthly cost for every major Claude, GPT-5, and Gemini model, side by side, with context caching savings and a best-value recommendation.

Zeeshan Tofiq

Full Stack Developer

How LLMCostCalc works

1
Enter your expected call volume
Type your API calls per day, or pick a preset from 100/day to 100,000/day. This is the single biggest driver of your monthly cost, since every other number is multiplied by it.
2
Set your average prompt and response sizes
Enter average input tokens (your prompt, system message, and any context) and average output tokens (the model's response). Presets cover short prompts (~500 tokens) through extended context (~32,000 tokens).
3
Adjust working days per month
Defaults to 30. If your workload only runs on business days, set this to 20-22 to get a more accurate monthly figure.
4
Pick which models to compare
All 8 flagship, balanced, and fast-tier models across Anthropic, OpenAI, and Google are selected by default. Deselect any you don't need, or use 'Select all' / 'Deselect all' to reset.
5
Toggle context caching if applicable
If most of your input tokens are repeated across calls (a system prompt, tool schemas, RAG documents), enable caching savings. The calculator assumes 80% of input tokens are billed at the cached-read rate for Anthropic and Google models.
6
Read the comparison table and recommendation
Each model shows its monthly input, output, and total cost, plus cost per call. Rows are color-coded by cost tier, the cheapest option is flagged, and a plain-English recommendation highlights the best cost-to-quality tradeoff.

What each part of the output means

The comparison table and recommendation box translate raw per-token pricing into numbers you can act on.

Cost tiers (row color)

Each model's total monthly cost is color-coded so you can scan for budget fit at a glance: green is under $50/month, yellow is $50-500/month, orange is $500-2,000/month, and red is over $2,000/month. The cheapest selected model is flagged with a 'Cheapest' badge.

1,000 calls/day, 2,000 input + 800 output tokens, 30 days

Gemini 2.5 Flash    $11.70/mo    (green — Cheapest)
Claude Haiku 4.5    $144.00/mo   (yellow)
GPT-5 Pro           $1,320.00/mo (orange)
Claude Opus 4.5     $2,700.00/mo (red)

Input/Output split and cost per call

Input cost and output cost are shown separately because output tokens are typically priced 3-5x higher than input tokens. If your output cost dominates, shortening responses (lower max_tokens, more concise prompts) saves more than optimizing the prompt. Cost per call is useful for attributing cost to a specific feature: multiply it by your feature's expected call volume.

Claude Sonnet 4.6 at 1,000 calls/day, 30 days:

Input:  $180.00/mo  (60,000,000 tokens × $3/MTok)
Output: $360.00/mo  (24,000,000 tokens × $15/MTok)
Total:  $540.00/mo
Cost per call: $0.018

Recommendation

A plain-English summary picking the highest-tier model (flagship/balanced/fast) that still falls in the 'moderate cost or below' band. If the absolute cheapest model is a different one (usually a fast-tier model), it's named separately so you can decide whether the quality tradeoff is worth the saving.

"For this volume, Gemini 2.5 Pro (balanced tier) gives the
best performance-to-cost ratio at $195.00/month. If cost is the
only factor, Gemini 2.5 Flash is cheapest at $11.70/month."

Current LLM API pricing (June 2026)

Per-million-token rates used by this calculator. Prices change frequently: this table is also available inside the tool, and you can verify current rates against each provider's pricing page.

Model	Tier	Input $/MTok	Output $/MTok
Claude Opus 4.5	Flagship	$15.00	$75.00
Claude Sonnet 4.6	Balanced	$3.00	$15.00
Claude Haiku 4.5	Fast	$0.80	$4.00
GPT-5 Pro	Flagship	$10.00	$30.00
GPT-5	Balanced	$2.50	$10.00
GPT-5 mini	Fast	$0.15	$0.60
Gemini 2.5 Pro	Balanced	$1.25	$5.00
Gemini 2.5 Flash	Fast	$0.075	$0.30

1 MTok = 1,000,000 tokens, roughly 750,000 words of English text. Anthropic and Google models also support context caching: cached input tokens are billed at roughly 10% (Anthropic) or 25% (Google) of the standard input rate.

When to use LLMCostCalc

Situation	What to do
Planning a new RAG pipeline	Enter expected daily query volume with your typical retrieved-context size as input tokens, and compare against your budget ceiling
Deciding whether to downgrade from Sonnet to Haiku	Select both models with your current volume to see the exact dollar difference before changing your production model
Presenting an AI infrastructure estimate to a manager	Set your projected volume and screenshot the comparison table and recommendation for the proposal
Choosing between Claude Opus and GPT-5 Pro for an agentic workflow	Select only the flagship-tier models and compare the 10x+ cost difference at your expected session volume
Evaluating prompt caching for a large system prompt	Toggle context caching savings on and off with the same volume to see the exact monthly saving
Comparing direct API costs to GitHub Copilot's token billing	Use this calculator for the direct-API side, then compare against the Copilot Credit Calculator for the same workload

Frequently Asked Questions

What is LLMCostCalc and how does it work?

LLMCostCalc is a free browser-based calculator that estimates your monthly LLM API bill across Anthropic Claude, OpenAI GPT-5, and Google Gemini models. Enter your expected call volume and average prompt/response sizes, and it calculates the input and output token cost for every model at current per-token pricing.

Instead of bouncing between three pricing pages and doing the multiplication yourself, you get a side-by-side cost comparison, a color-coded cost tier for each model, and a plain-English recommendation for the best cost-to-quality tradeoff.

How much does the Claude API cost per month?

It depends entirely on call volume, prompt size, and which Claude model you use. As a reference point, at 1,000 calls/day with 2,000 input tokens and 800 output tokens per call (30-day month), Claude Sonnet 4.6 costs roughly $540/month, Claude Haiku 4.5 costs roughly $144/month, and Claude Opus 4.5 costs roughly $2,700/month.

Enable the context caching toggle if most of your prompt is repeated across calls (a system prompt, RAG context, or few-shot examples): caching 80% of input tokens drops the Sonnet 4.6 example above to roughly $410/month.

Is Claude cheaper than GPT-5?

	Anthropic Claude	OpenAI GPT-5
Flagship tier	Opus 4.5: $15 / $75 per MTok	GPT-5 Pro: $10 / $30 per MTok
Balanced tier	Sonnet 4.6: $3 / $15 per MTok	GPT-5: $2.50 / $10 per MTok
Fast tier	Haiku 4.5: $0.80 / $4 per MTok	GPT-5 mini: $0.15 / $0.60 per MTok
Context caching	Yes, ~10% of input price for cached reads	Not modeled in this calculator

At equivalent tiers, GPT-5 models are typically cheaper per token, especially at the fast/lightweight tier. Claude's prompt caching can close or reverse that gap for workloads with a large, repeated context (system prompts, tool definitions, RAG documents). The only way to know which is cheaper for your workload is to plug in your actual volume and token sizes, which is exactly what this calculator does.

How do I calculate my LLM API costs manually?

The underlying formula is simple arithmetic. For a given model:

text

monthly_calls = calls_per_day * working_days
input_tokens_per_month = monthly_calls * avg_input_tokens
output_tokens_per_month = monthly_calls * avg_output_tokens

input_cost = (input_tokens_per_month / 1_000_000) * price_per_mtok_input
output_cost = (output_tokens_per_month / 1_000_000) * price_per_mtok_output

total_cost = input_cost + output_cost
cost_per_call = total_cost / monthly_calls

LLMCostCalc runs this same calculation for every selected model simultaneously, with an additional adjustment when context caching is enabled: 80% of input tokens are billed at the model's cached-read rate instead of the standard input rate.

Does Anthropic or Google offer context caching to reduce costs?

Yes. Anthropic's prompt caching and Google's context caching both let you cache large, frequently reused portions of your prompt (system instructions, tool schemas, RAG documents) so that subsequent calls are billed at a fraction of the standard input token price for the cached portion.

The 'Context caching savings' toggle in LLMCostCalc models this by billing 80% of input tokens at the cached-read rate (roughly 10% of the standard input price for Anthropic, 25% for Google) and the remaining 20% at the standard rate. This is a simplification: actual savings depend on your cache hit rate and how your provider charges for cache writes.

Does this calculator send my usage data anywhere?

No. All pricing data is a hardcoded JSON object and all calculations run in JavaScript in your browser. Nothing is sent to any server, there's no API call, and no account is required.

What is the cheapest LLM API for high volume?

At very high call volumes, the fast/lightweight tier models dominate: Gemini 2.5 Flash ($0.075 / $0.30 per MTok), GPT-5 mini ($0.15 / $0.60 per MTok), and Claude Haiku 4.5 ($0.80 / $4 per MTok) are typically the cheapest options, often 10-50x cheaper than flagship models for the same volume.

These models are well suited for classification, extraction, summarization, and other tasks that don't require frontier reasoning. A common pattern is to route routine, high-volume requests to a fast model and reserve flagship models (Opus, GPT-5 Pro) for tasks that genuinely need them. Select only the fast-tier models in the calculator to compare them directly at your volume.

How LLMCostCalc works

Enter your expected call volume

Type your API calls per day, or pick a preset from 100/day to 100,000/day. This is the single biggest driver of your monthly cost, since every other number is multiplied by it.

Set your average prompt and response sizes

Enter average input tokens (your prompt, system message, and any context) and average output tokens (the model's response). Presets cover short prompts (~500 tokens) through extended context (~32,000 tokens).

Adjust working days per month

Defaults to 30. If your workload only runs on business days, set this to 20-22 to get a more accurate monthly figure.

Pick which models to compare

All 8 flagship, balanced, and fast-tier models across Anthropic, OpenAI, and Google are selected by default. Deselect any you don't need, or use 'Select all' / 'Deselect all' to reset.

Toggle context caching if applicable

If most of your input tokens are repeated across calls (a system prompt, tool schemas, RAG documents), enable caching savings. The calculator assumes 80% of input tokens are billed at the cached-read rate for Anthropic and Google models.

Read the comparison table and recommendation

Each model shows its monthly input, output, and total cost, plus cost per call. Rows are color-coded by cost tier, the cheapest option is flagged, and a plain-English recommendation highlights the best cost-to-quality tradeoff.