Dev Encyclopedia
ArticlesTools

Get notified when new content drops

No spam. Just new articles, tools, and updates straight to your inbox.

Dev Encyclopedia

A reference for builders

Content

  • Articles
  • Tools
  • Contact

Connect

  • support@devencyclopedia.com
  • RSS Feed

© 2026 Dev Encyclopedia

Privacy PolicyTermsDisclaimer
  1. Home
  2. /
  3. Tools
  4. /
  5. LLMCostCalc
Free · No login · Runs in your browser

What will your LLM API bill be next month?

Enter your expected call volume and prompt sizes. LLMCostCalc shows your monthly cost for every major Claude, GPT-5, and Gemini model, side by side, with context caching savings and a best-value recommendation.

How LLMCostCalc works

  1. 1

    Enter your expected call volume

    Type your API calls per day, or pick a preset from 100/day to 100,000/day. This is the single biggest driver of your monthly cost, since every other number is multiplied by it.

  2. 2

    Set your average prompt and response sizes

    Enter average input tokens (your prompt, system message, and any context) and average output tokens (the model's response). Presets cover short prompts (~500 tokens) through extended context (~32,000 tokens).

  3. 3

    Adjust working days per month

    Defaults to 30. If your workload only runs on business days, set this to 20-22 to get a more accurate monthly figure.

  4. 4

    Pick which models to compare

    All 8 flagship, balanced, and fast-tier models across Anthropic, OpenAI, and Google are selected by default. Deselect any you don't need, or use 'Select all' / 'Deselect all' to reset.

  5. 5

    Toggle context caching if applicable

    If most of your input tokens are repeated across calls (a system prompt, tool schemas, RAG documents), enable caching savings. The calculator assumes 80% of input tokens are billed at the cached-read rate for Anthropic and Google models.

  6. 6

    Read the comparison table and recommendation

    Each model shows its monthly input, output, and total cost, plus cost per call. Rows are color-coded by cost tier, the cheapest option is flagged, and a plain-English recommendation highlights the best cost-to-quality tradeoff.

What each part of the output means

The comparison table and recommendation box translate raw per-token pricing into numbers you can act on.

Cost tiers (row color)

Each model's total monthly cost is color-coded so you can scan for budget fit at a glance: green is under $50/month, yellow is $50-500/month, orange is $500-2,000/month, and red is over $2,000/month. The cheapest selected model is flagged with a 'Cheapest' badge.

1,000 calls/day, 2,000 input + 800 output tokens, 30 days

Gemini 2.5 Flash    $11.70/mo    (green — Cheapest)
Claude Haiku 4.5    $144.00/mo   (yellow)
GPT-5 Pro           $1,320.00/mo (orange)
Claude Opus 4.5     $2,700.00/mo (red)
Input/Output split and cost per call

Input cost and output cost are shown separately because output tokens are typically priced 3-5x higher than input tokens. If your output cost dominates, shortening responses (lower max_tokens, more concise prompts) saves more than optimizing the prompt. Cost per call is useful for attributing cost to a specific feature: multiply it by your feature's expected call volume.

Claude Sonnet 4.6 at 1,000 calls/day, 30 days:

Input:  $180.00/mo  (60,000,000 tokens × $3/MTok)
Output: $360.00/mo  (24,000,000 tokens × $15/MTok)
Total:  $540.00/mo
Cost per call: $0.018
Recommendation

A plain-English summary picking the highest-tier model (flagship/balanced/fast) that still falls in the 'moderate cost or below' band. If the absolute cheapest model is a different one (usually a fast-tier model), it's named separately so you can decide whether the quality tradeoff is worth the saving.

"For this volume, Gemini 2.5 Pro (balanced tier) gives the
best performance-to-cost ratio at $195.00/month. If cost is the
only factor, Gemini 2.5 Flash is cheapest at $11.70/month."

Current LLM API pricing (June 2026)

Per-million-token rates used by this calculator. Prices change frequently: this table is also available inside the tool, and you can verify current rates against each provider's pricing page.

ModelTierInput $/MTokOutput $/MTok
Claude Opus 4.5Flagship$15.00$75.00
Claude Sonnet 4.6Balanced$3.00$15.00
Claude Haiku 4.5Fast$0.80$4.00
GPT-5 ProFlagship$10.00$30.00
GPT-5Balanced$2.50$10.00
GPT-5 miniFast$0.15$0.60
Gemini 2.5 ProBalanced$1.25$5.00
Gemini 2.5 FlashFast$0.075$0.30

1 MTok = 1,000,000 tokens, roughly 750,000 words of English text. Anthropic and Google models also support context caching: cached input tokens are billed at roughly 10% (Anthropic) or 25% (Google) of the standard input rate.

When to use LLMCostCalc

SituationWhat to do
Planning a new RAG pipelineEnter expected daily query volume with your typical retrieved-context size as input tokens, and compare against your budget ceiling
Deciding whether to downgrade from Sonnet to HaikuSelect both models with your current volume to see the exact dollar difference before changing your production model
Presenting an AI infrastructure estimate to a managerSet your projected volume and screenshot the comparison table and recommendation for the proposal
Choosing between Claude Opus and GPT-5 Pro for an agentic workflowSelect only the flagship-tier models and compare the 10x+ cost difference at your expected session volume
Evaluating prompt caching for a large system promptToggle context caching savings on and off with the same volume to see the exact monthly saving
Comparing direct API costs to GitHub Copilot's token billingUse this calculator for the direct-API side, then compare against the Copilot Credit Calculator for the same workload

Frequently Asked Questions

What is LLMCostCalc and how does it work?

LLMCostCalc is a free browser-based calculator that estimates your monthly LLM API bill across Anthropic Claude, OpenAI GPT-5, and Google Gemini models. Enter your expected call volume and average prompt/response sizes, and it calculates the input and output token cost for every model at current per-token pricing.

Instead of bouncing between three pricing pages and doing the multiplication yourself, you get a side-by-side cost comparison, a color-coded cost tier for each model, and a plain-English recommendation for the best cost-to-quality tradeoff.

How much does the Claude API cost per month?

It depends entirely on call volume, prompt size, and which Claude model you use. As a reference point, at 1,000 calls/day with 2,000 input tokens and 800 output tokens per call (30-day month), Claude Sonnet 4.6 costs roughly $540/month, Claude Haiku 4.5 costs roughly $144/month, and Claude Opus 4.5 costs roughly $2,700/month.

Enable the context caching toggle if most of your prompt is repeated across calls (a system prompt, RAG context, or few-shot examples): caching 80% of input tokens drops the Sonnet 4.6 example above to roughly $410/month.

Is Claude cheaper than GPT-5?
Anthropic ClaudeOpenAI GPT-5
Flagship tierOpus 4.5: $15 / $75 per MTokGPT-5 Pro: $10 / $30 per MTok
Balanced tierSonnet 4.6: $3 / $15 per MTokGPT-5: $2.50 / $10 per MTok
Fast tierHaiku 4.5: $0.80 / $4 per MTokGPT-5 mini: $0.15 / $0.60 per MTok
Context cachingYes, ~10% of input price for cached readsNot modeled in this calculator

At equivalent tiers, GPT-5 models are typically cheaper per token, especially at the fast/lightweight tier. Claude's prompt caching can close or reverse that gap for workloads with a large, repeated context (system prompts, tool definitions, RAG documents). The only way to know which is cheaper for your workload is to plug in your actual volume and token sizes, which is exactly what this calculator does.

How do I calculate my LLM API costs manually?

The underlying formula is simple arithmetic. For a given model:

text
monthly_calls = calls_per_day * working_days
input_tokens_per_month = monthly_calls * avg_input_tokens
output_tokens_per_month = monthly_calls * avg_output_tokens

input_cost = (input_tokens_per_month / 1_000_000) * price_per_mtok_input
output_cost = (output_tokens_per_month / 1_000_000) * price_per_mtok_output

total_cost = input_cost + output_cost
cost_per_call = total_cost / monthly_calls

LLMCostCalc runs this same calculation for every selected model simultaneously, with an additional adjustment when context caching is enabled: 80% of input tokens are billed at the model's cached-read rate instead of the standard input rate.

Does Anthropic or Google offer context caching to reduce costs?

Yes. Anthropic's prompt caching and Google's context caching both let you cache large, frequently reused portions of your prompt (system instructions, tool schemas, RAG documents) so that subsequent calls are billed at a fraction of the standard input token price for the cached portion.

The 'Context caching savings' toggle in LLMCostCalc models this by billing 80% of input tokens at the cached-read rate (roughly 10% of the standard input price for Anthropic, 25% for Google) and the remaining 20% at the standard rate. This is a simplification: actual savings depend on your cache hit rate and how your provider charges for cache writes.

ℹ Info

OpenAI also offers automatic prompt caching on repeated prefixes, but it is not modeled separately in this calculator since the discount is applied automatically and isn't user-configurable in the same way.

Does this calculator send my usage data anywhere?

No. All pricing data is a hardcoded JSON object and all calculations run in JavaScript in your browser. Nothing is sent to any server, there's no API call, and no account is required.

💡 Tip

The calculator works offline once the page is loaded. Your call volume and token estimates never leave your machine.

What is the cheapest LLM API for high volume?

At very high call volumes, the fast/lightweight tier models dominate: Gemini 2.5 Flash ($0.075 / $0.30 per MTok), GPT-5 mini ($0.15 / $0.60 per MTok), and Claude Haiku 4.5 ($0.80 / $4 per MTok) are typically the cheapest options, often 10-50x cheaper than flagship models for the same volume.

These models are well suited for classification, extraction, summarization, and other tasks that don't require frontier reasoning. A common pattern is to route routine, high-volume requests to a fast model and reserve flagship models (Opus, GPT-5 Pro) for tasks that genuinely need them. Select only the fast-tier models in the calculator to compare them directly at your volume.

Related reading

Cost

GitHub Copilot Token Billing: Real Cost by Workflow

GitHub Copilot switched to AI Credits billing in June 2026. Compare what Copilot charges per workflow against calling Claude, GPT-5, or Gemini directly.

Guide

Caching Strategies Explained

Caching isn't just for databases and CDNs. The same cache-aside principles apply to LLM prompt caching: understand the patterns before enabling it.

Your Usage

30

Context caching savings

Assumes 80% of input tokens are served from cache. Applies to Anthropic and Google models.

Models to Compare

Anthropic

OpenAI

Google

Monthly Cost Comparison

30,000 calls/mo · 84,000,000 tokens/mo

ModelInput/moOutput/moTotal/moCost/call

Gemini 2.5 FlashCheapest

Google · Fast

$4.50$7.20$11.70$0.000390

GPT-5 mini

OpenAI · Fast

$9.00$14.40$23.40$0.000780

Claude Haiku 4.5

Anthropic · Fast

$48.00$96.00$144.00$0.004800

Gemini 2.5 Pro

Google · Balanced

$75.00$120.00$195.00$0.006500

GPT-5

OpenAI · Balanced

$150.00$240.00$390.00$0.0130

Claude Sonnet 4.6

Anthropic · Balanced

$180.00$360.00$540.00$0.0180

GPT-5 Pro

OpenAI · Flagship

$600.00$720.00$1,320$0.0440

Claude Opus 4.5

Anthropic · Flagship

$900.00$1,800$2,700$0.0900
< $50/mo $50-500/mo $500-2,000/mo> $2,000/mo

Recommendation

For this volume, Gemini 2.5 Pro (balanced tier) gives the best performance-to-cost ratio at $195.00/month. If cost is the only factor, Gemini 2.5 Flash is cheapest at $11.70/month.