AI API cost calculator

Use this formula:

monthly input cost = input tokens per request * requests per month * input price per 1M tokens / 1,000,000
monthly output cost = output tokens per request * requests per month * output price per 1M tokens / 1,000,000
monthly API cost = input cost + output cost + platform/tool fees

The calculator should ask for:

  • Requests per day.
  • Active days per month.
  • Average input tokens per request.
  • Average output tokens per request.
  • Model input price per 1 million tokens.
  • Model output price per 1 million tokens.
  • Team size.
  • Optional cached-input percentage.
  • Optional platform or automation fees.

If the user does not know token counts, use rough starting assumptions:

  • Short prompt: 300 to 1,000 input tokens.
  • Normal chat or task: 1,000 to 3,000 input tokens.
  • Document summary: 5,000 to 50,000+ input tokens.
  • Short answer: 200 to 800 output tokens.
  • Long answer: 1,000 to 3,000 output tokens.

Example pricing anchors

Current official provider pages commonly show prices per 1 million tokens. As of the current check, official pages for OpenAI, Anthropic, Google Gemini, Mistral, and OpenRouter use model-specific pricing. OpenRouter also makes pricing highly model-specific because it routes across many models and providers.

For planning, do not average all providers together. Choose one model tier:

  • Low-cost model: useful for classification, routing, simple extraction, short summaries.
  • Mid-tier model: useful for normal automation, writing, support drafts, coding helpers.
  • Premium model: useful for hard reasoning, final answers, complex coding, and high-stakes review.

Then calculate input and output separately. A workflow that generates long answers can cost much more than a workflow that reads long inputs and returns short labels.

Usage scenarios

Scenario Assumption Planning range
Personal experiments 20 to 100 short requests/day $1 to $25/month
Creator workflow drafts, summaries, rewrites, light automations $10 to $150/month
Small team 3 to 10 users with mixed daily workflows $50 to $1,000+/month
Document-heavy workflow long PDFs, transcripts, knowledge base tasks $100 to $2,000+/month
SaaS feature user-facing feature with unpredictable usage $500 to $10,000+/month
Agent workflow multi-step tool calls and retries highly variable; monitor from day one

The low end assumes small models, short prompts, short outputs, and modest request volume. The middle assumes normal daily use and some long prompts. The high end assumes premium models, long outputs, document processing, retries, or many users.

Local AI cost calculator extension

Local AI has no per-token provider bill when the model runs on your own machine, but it is not free. Add:

  • Hardware purchase price.
  • Useful life in months.
  • Electricity cost per kWh.
  • Estimated watts under load.
  • Hours used per day.
  • Optional maintenance or admin time.

Use:

monthly hardware cost = hardware price / useful life in months
monthly electricity cost = watts / 1000 * hours per day * days per month * cost per kWh

Planning ranges:

  • Existing PC: $0 to $20/month incremental cost for light use.
  • Budget local AI desktop: $25 to $80/month amortized over 24 to 36 months.
  • Strong desktop: $60 to $150/month amortized.
  • Workstation: $150 to $400+/month amortized.
  • Hybrid local plus API: local hardware cost plus $5 to $500+/month API usage.

These estimates exclude your time, backups, repairs, and missed quality if a local model is weaker than a cloud model.

What moves AI API cost most

Output length

Output tokens often cost more than input tokens. A workflow that asks for long essays, detailed code, or repeated rewrites can be more expensive than expected.

Model tier

Premium models cost more because they are better for difficult work. Use them where quality matters. Use smaller models for routing, extraction, cleanup, and first drafts.

Long context and documents

Long documents can dominate input cost. Chunk documents, summarize in stages, or use retrieval instead of repeatedly sending the full source.

Agents and retry loops

Agents can multiply requests. A single user action may trigger planning, tool calls, retries, validation, and final response generation. Put limits in place early.

Automations

n8n, Dify, Zapier, Make, custom workers, and cron jobs can run silently. A workflow that runs every hour can become expensive even with a cheap model.

When cloud AI is usually better

Choose cloud AI when you need best available model quality, reliable APIs, scaling, observability, and no local hardware maintenance. Cloud is also better when usage is occasional and a hardware purchase would not pay back.

When local AI is usually better

Choose local AI when you already own capable hardware, need offline use, want private drafting, have repeated low-stakes tasks, or need a local model backend for experiments. Local AI is not automatically better for production quality.

Hybrid setup

A good hybrid stack uses local AI for drafts, internal notes, summaries, routing, and privacy-sensitive experiments. It uses cloud AI for final review, complex reasoning, code changes that need high reliability, and production user-facing features.

Common mistakes

Do not compare a tiny local model to a premium cloud model as if quality is equal. Do not ignore output tokens. Do not let agents retry forever. Do not process full documents repeatedly when a retrieval workflow would do. Do not launch a SaaS AI feature without usage limits, logs, and billing alerts.

Bottom line

Calculate AI API cost from requests, tokens, model prices, and days per month. Compare that with local hardware amortization, electricity, and maintenance. Cloud is often best for quality and scaling. Local AI is often best for private, repeated, or offline work. Many practical systems use both.