Tracking API calls across OpenClaw deployments reveals a dramatic spread. The cheapest instances run around $4/month. The most expensive hit $387. Same software, same features enabled, same approximate usage volume. The entire difference comes down to which models handle which tasks and whether anyone bothers to configure the defaults.
If you already understand what OpenClaw costs at a high level, our pricing breakdown guide covers hosting, API basics, and total monthly budgets. This article goes deeper: where your tokens actually go, how to measure what you are spending, and the specific configurations that can cut API bills by 60-80% without degrading agent output quality.
Where Your Tokens Actually Go
Before optimizing anything, you need to understand the consumption pattern. Most people assume their tokens go to the messages they type and the responses they read. That is maybe 10% of the total.
Here is a typical breakdown:
| Token Consumer | Share of Total | Why It Matters |
|---|---|---|
| Conversation context (history) | 40-50% | Every message replays the full conversation so far |
| Tool call definitions + output | 20-30% | Each enabled tool adds tokens to every API call |
| System prompts + memory retrieval | 10-15% | Your workspace files, personality, instructions |
| Model responses (the useful part) | 8-12% | The actual output you read |
| Retries, errors, overhead | 3-5% | Failed calls still cost money |
The biggest line item is conversation context. In a 40-message thread, the first message gets sent to the API 40 times as part of the growing input context. This is why long conversations get exponentially more expensive. Session management matters more than most guides acknowledge.
The second surprise is tool definitions. OpenClaw sends the schema for every enabled tool with every API call. If you have 15 tools enabled but only use 3 regularly, you are paying for 12 unused tool definitions on every single request. This can add 2,000-3,000 tokens per call on deployments with all integrations enabled.
Model Pricing: The 50x Gap You Are Paying For
The model you choose is the single largest cost lever. Here are current prices as of April 2026:
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Relative Cost |
|---|---|---|---|
| Gemini 3.1 Flash | $0.10 | $0.40 | 1x (baseline) |
| GPT-5.4 Nano | $0.15 | $0.60 | 1.5x |
| Groq (Llama 4 Scout) | $0.06 | $0.18 | 0.6x |
| Claude Haiku 4.5 | $1.00 | $5.00 | 10x |
| Claude Sonnet 4.6 | $3.00 | $15.00 | 30x |
| GPT-5.4 | $2.50 | $10.00 | 25x |
| Gemini 3.1 Pro | $2.00 | $12.00 | 20x |
| Claude Opus 4.6 | $5.00 | $25.00 | 50x |
The gap between Groq running Llama 4 Scout at $0.06/1M input tokens and Claude Opus 4.6 at $5.00/1M is roughly 83x. On identical usage, one user spends $3/month and another spends $250.
Most competitors skip Groq entirely in their cost guides. For routine OpenClaw tasks like scheduling, reminders, and simple lookups, Groq delivers sub-second responses at near-zero cost. The quality ceiling is lower than Opus or GPT-5.4 for complex reasoning, but the quality floor is more than adequate for 70-80% of what a typical OpenClaw agent does daily.
The Heartbeat Tax
We covered heartbeat costs in our pricing breakdown, but the optimization angle matters here. Each heartbeat cycle consumes 8,000-15,000 input tokens regardless of whether anything needs attention. At 48 cycles per day (default 30-minute interval):
On Claude Opus 4.6, that is $72/month in heartbeat costs alone. On Gemini 3.1 Flash, the same heartbeat pattern costs $1.44/month.
Three levers to reduce heartbeat spend:
Switch the heartbeat model. Your heartbeat checks memory and triages tasks. It does not need frontier-model reasoning. GPT-5.4 Nano or Gemini 3.1 Flash handle this well. Running heartbeats on Flash works well without missed triages.
Extend the interval. Going from 30 minutes to 60 minutes cuts heartbeat costs in half. Going to 120 minutes cuts them by 75%. If your workflows tolerate a 2-hour response window, this is free money. Our heartbeat scheduling guide covers interval configuration in detail.
Trim your heartbeat.md. A bloated heartbeat file means more tokens per cycle. Keep instructions tight. Five clear directives beat twenty vague ones, both for cost and for agent behavior.
Model Routing: The Configuration That Saves 60-80%
Model routing means sending different tasks to different models based on what the task requires. It is the highest-impact optimization available.
The principle: 80% of OpenClaw interactions are routine. Checking email subjects, setting reminders, summarizing short texts, answering factual questions. These tasks run perfectly well on budget models. The remaining 20% involve complex reasoning, long-form writing, code generation, or multi-step planning. Those deserve a premium model.
Here is a recommended routing configuration:
Budget tier (Gemini 3.1 Flash or Groq): Heartbeat cycles, simple lookups, reminders, status checks, basic email triage. Roughly 80% of all API calls.
Mid tier (Claude Sonnet 4.6 or GPT-5.4): Content drafting, data analysis, complex email responses, multi-step workflows. Roughly 15% of calls.
Premium tier (Claude Opus 4.6): Code review, strategic planning, nuanced writing tasks. Roughly 5% of calls.
You configure this in your OpenClaw workspace files. Our memory configuration guide walks through the workspace setup. The key file is your agent’s system prompt where you define model preferences by task type.
This routing pattern can reduce monthly API spend from $45 to $11 on a moderate-use deployment. The routine tasks showed no quality difference. The complex tasks still got premium model quality because we routed them appropriately.
Prompt Caching: The 50-70% Savings Most People Miss
Anthropic and OpenAI both offer prompt caching, but most OpenClaw users never configure it. Prompt caching stores your system prompt, tool definitions, and frequently repeated context on the provider’s servers. When the cached prefix matches on subsequent calls, you pay a fraction of the full input token price.
Anthropic’s implementation is the most impactful for OpenClaw. Cached input tokens cost $0.30 per million on Claude Sonnet 4.6 instead of the standard $3.00. That is a 90% reduction on the cached portion. Since system prompts and tool definitions are identical across calls and represent 10-15% of your token consumption, the real-world savings on total spend are typically 15-25%.
The catch: caching works best when your system prompt stays stable. If you are constantly editing workspace files, the cache invalidates frequently and savings drop. Settle on your agent configuration first, then let caching compound the savings.
For OpenClaw specifically, the biggest caching win comes from tool definitions. Those 2,000-3,000 tokens of tool schemas that ship with every request become nearly free once cached.
Context Management: Stop Paying for Old Messages
Conversation context is your largest token consumer at 40-50% of total spend. Every message in a thread gets re-sent as input tokens on the next API call. A 40-message conversation means message #1 has been sent (and billed) 40 times.
Three ways to manage this:
Start new sessions frequently. Instead of one 50-message marathon, break work into 5-10 message focused sessions. Each new session resets the context window. This alone can cut context costs by 60-70% for users who tend toward long conversations.
Enable compaction. OpenClaw’s compaction feature summarizes old messages to reduce context size. It costs a small amount of tokens to run the summarization, but saves much more on subsequent calls. The tradeoff is that summarized context loses some nuance. For most workflows, this is acceptable.
Disable unused tools. Every enabled tool adds its schema to every API call. If you have browser automation, GitHub integration, and email tools enabled but only use email daily, disable the others when not needed. Disabling tools you use weekly rather than keeping them always-on can trim 3,000 tokens per call.
Tracking Your Actual Spend
Every competitor article says “monitor your API dashboard.” None of them explain what to look for. Here is a practical method.
Weekly check (5 minutes): Log into your API provider dashboard (OpenAI, Anthropic, or Google AI Studio). Look at the daily spend graph. You are looking for two things: is the trend stable, and are there spikes on days you did not expect heavy usage. Spikes usually mean a runaway automation or a long conversation you forgot about.
Monthly audit (15 minutes): Compare your total API spend against your expected budget. Break it down by model if your provider supports it. Anthropic’s dashboard shows per-model usage. OpenAI’s shows per-API-key usage. If one model accounts for more than 50% of spend and it is not your budget model, your routing is misconfigured.
The quick estimation formula: Take your average daily message count, multiply by average tokens per message (roughly 3,000-5,000 for a typical OpenClaw interaction including context), multiply by your model’s per-token input price, then add heartbeat cost (48 cycles x ~10,000 tokens x input price). That gives you a ballpark monthly estimate within 20-30% of actual.
For example: 20 messages/day x 4,000 tokens x $0.10/1M (Gemini Flash) = $0.008/day for messages. Plus heartbeat: 48 x 10,000 x $0.10/1M = $0.048/day. Total: roughly $1.68/month. Switch to Claude Sonnet 4.6 for everything and the same calculation gives $54/month. The model choice drives everything.
Four Real Cost Scenarios
Based on community data and typical configurations:
Minimal ($3-8/month): Groq or Gemini Flash for everything, heartbeat on Flash at 60-minute intervals, 10-15 messages per day. Best for: personal task management, simple reminders, basic email triage.
Optimized ($10-20/month): Model routing with 80% budget / 15% mid-tier / 5% premium, heartbeat on Flash at 30-minute intervals, 20-30 messages per day. Best for: business users who want quality outputs on important tasks without overpaying on routine work.
Standard unoptimized ($40-80/month): Single mid-tier model for everything including heartbeat, default 30-minute interval, 20-30 messages per day. This is where most users land if they pick Claude Sonnet or GPT-5.4 as their default and never configure routing.
Expensive unoptimized ($200-400+/month): Claude Opus or GPT-5.4 for everything, 15-minute heartbeat interval, heavy daily usage, multiple integrations enabled. This is the scenario Reddit users describe when they post about surprise bills.
The gap between “optimized” and “standard unoptimized” is a 4-6x cost difference for comparable output quality on routine tasks.
Frequently Asked Questions
How much does OpenClaw cost per month in API fees alone?
Between $3 and $400+, depending entirely on model selection and configuration. A well-configured setup using model routing and budget models for routine tasks runs $10-20/month. An unoptimized setup using Claude Opus for everything including heartbeats can exceed $300/month on moderate usage. Our pricing breakdown covers total costs including hosting.
Which model should I use to keep costs under $10/month?
Gemini 3.1 Flash or Groq (Llama 4 Scout) as your default. Both deliver adequate quality for routine OpenClaw tasks at roughly $0.10/1M input tokens. If you need higher quality for specific tasks, configure model routing to send only those tasks to a premium model while keeping 80%+ of traffic on the budget option.
Does the heartbeat cost money even when nothing happens?
Yes. Every heartbeat cycle consumes 8,000-15,000 tokens regardless of whether the agent takes action. On budget models this costs pennies. On Claude Opus 4.6 it costs roughly $2.40/day or $72/month. The heartbeat model and interval are the two highest-leverage cost settings in your entire OpenClaw configuration.
Can I use completely free models with OpenClaw?
You can run local models through Ollama at zero API cost. The tradeoff is quality: local models running on consumer hardware are noticeably worse at complex reasoning, tool use, and multi-step planning compared to cloud APIs. Google AI Studio offers a free tier for Gemini models with rate limits. For testing and light personal use, these options work. For anything business-critical, budget cloud models like Gemini Flash at $0.10/1M tokens are cheap enough that the quality improvement justifies the cost.
How do I know if my model routing is working?
Check your API provider dashboard weekly. If your budget model does not account for 70-80% of total API calls, routing is not configured correctly or your task distribution does not match your routing rules. Anthropic’s dashboard breaks down usage by model. OpenAI shows usage by API key, so you may need separate keys per tier.
Is prompt caching worth setting up?
If you use Claude models, yes. Cached input tokens on Sonnet 4.6 cost $0.30/1M instead of $3.00/1M, a 90% reduction on cached content. Since your system prompt and tool definitions repeat on every call, caching delivers 15-25% total savings with zero quality impact. The main requirement is keeping your system prompt stable rather than editing it constantly.
Why is my OpenClaw bill so much higher than expected?
Three common causes: (1) running a premium model as default for all tasks including heartbeats, (2) long conversation sessions that accumulate massive context, and (3) forgotten automations or test workflows still running in the background. Start by checking which model handles your heartbeat, then look at your average session length, then audit active automations.
What is the single most impactful cost change I can make?
Switch your heartbeat and routine tasks to Gemini 3.1 Flash or Groq. If you are currently running Claude Sonnet 4.6 as your default for everything, this one change typically saves $30-50/month. Model routing is more work to configure but delivers the largest total savings.
Key Takeaways
- Model choice drives 90%+ of your API cost variation. The gap between cheapest and most expensive is 50-80x.
- Conversation context (40-50% of tokens) and tool definitions (20-30%) consume far more tokens than the actual model responses you read.
- Model routing with 80% budget / 15% mid / 5% premium models cuts API costs by 60-80% with minimal quality tradeoff on routine work.
- Heartbeat optimization (cheap model + longer interval) is the single easiest win, saving $30-70/month for users currently running premium defaults.
- Prompt caching on Claude models reduces cached input costs by 90%. Combined with stable system prompts, this adds 15-25% total savings.
- Track your spend weekly with a 5-minute dashboard check. Monthly surprises come from ignoring the dashboard, not from unpredictable pricing.
SFAI Labs