Cost & Tokens

I Spent $3,000 in My First Month

Real cost breakdown. Why running Opus 24/7 is expensive. The 5 habits that brought it down to $200.

E— who learned the hard way

The Bill That Made Me Sick

First month: $3,000+ on API costs alone.

I was running Claude Opus for everything. Every question. Every task. Every idle conversation. I didn't understand tokens. I didn't understand context windows. I didn't understand why longer conversations cost exponentially more.

Nobody told me. Every guide out there says "it's affordable." And it is — if you know what you're doing. I didn't know what I was doing.

The bill came. I stared at it. Then I learned everything in this guide the hard way so you don't have to.

How AI Costs Actually Work

You pay per token. A token is roughly 4 characters of text. Every space, every comma, every word gets counted.

You pay for two things: - Input tokens — what you send to the AI (your message + the entire conversation history) - Output tokens — what the AI generates back

Here's the kicker that nobody explains clearly: every message in a conversation includes ALL previous messages. The AI doesn't remember your conversation — it re-reads the whole thing every single time.

So a 50,000-token conversation history times 10 more messages = 500,000 tokens consumed. Not 50,000. Half a million.

Current pricing (approximate):

Prices change fast. Always check the provider's pricing page for current numbers.

Danger Zone

The 5 Cost Traps

These are the five ways people accidentally burn money. I hit all five.

Trap 1: Using Opus for everything. Most tasks don't need the most powerful model. Sorting files? Haiku. Writing a blog post? Sonnet. Building system architecture? Opus. Match the model to the task. You don't drive a Ferrari to get groceries.

Trap 2: Marathon conversations. Every message carries the full history. A 3-hour conversation costs 10x more than three 1-hour conversations doing the same work. Break your sessions. One topic per conversation. Hand off context between sessions instead of keeping one thread alive forever.

Trap 3: Bloated memory files. Your agents.md, memory.md, soul.md — these get loaded every single turn. If they grow to 5,000 tokens, you're paying for those tokens on EVERY message. That's thousands of tokens of overhead before you even ask a question. Keep them lean.

Trap 4: Cron jobs and heartbeats. Running an agent check every 15 minutes = 96 executions per day. At even $0.10 per execution, that's $10/day. $300/month. For a heartbeat. Most of those checks find nothing. You're paying for an agent to wake up, look around, shrug, and go back to sleep.

Trap 5: Not using sub-agents. One main agent doing everything = massive context window. Every task inherits the full conversation. Delegating to sub-agents keeps the main conversation lightweight. Complex work goes to temporary agents that terminate after the task. Their context dies with them. Your wallet thanks you.

Trap 6 (bonus): Switching to the cheapest model. When people see high token costs, their first reaction is "use a cheaper model!" This is a trap. Small models (30B, 70B parameters) have fatal problems. Weak context understanding — they lose track of earlier conversation. They can't chain 5+ tool calls without getting stuck. They're more vulnerable to prompt injection attacks. And frequent errors mean you retry the same request 3-4 times.

The retries cost more than just using the right model once. The cheapest model per token is often the most expensive model per task.

Solutions

The 5 Habits That Fixed It

These five changes took me from $3,000/month to $200/month. In order of impact.

Habit 1: Model switching. I use Haiku for quick questions, Sonnet for daily work, Opus only for complex reasoning and architecture decisions. This single change saved 60% immediately. Most of my usage was Opus answering questions that Haiku could handle just fine.

Habit 2: Session hygiene. Short, focused sessions. One topic per conversation. When I finish a task, I write a handoff note and start fresh. No more 3-hour marathon threads where the last message costs 50x what the first one did.

Habit 3: Memory diet. I kept MEMORY.md under 200 lines. Moved details to separate reference files that only load when specifically needed. Trimmed 70% of per-message token waste. Your memory file is not a diary. It's a cheat sheet.

Habit 4: Kill the cron. Replaced 15-minute heartbeats with event-driven triggers. The agent only runs when something actually happens — a new message, a file change, a webhook. Went from 96 executions/day to about 10.

Habit 5: Track everything. I check my API dashboard weekly. I set budget alerts at $50, $100, and $200. I know exactly where every dollar goes. You can't fix what you don't measure.

Bonus habit: The brain/muscle split. A smart architecture from the community: use your most powerful model (Opus) only for decisions and coordination — the "brain." For execution tasks — the "muscle" — use specialized, cheaper tools. Coding goes to Codex (cheap and strong at code). Web search goes to Brave API (cheap and fast). News and social data go to Grok API (plugged into social media).

The brain thinks and delegates. The muscle executes. The brain saves massive tokens because it never does the heavy lifting itself. This is how one builder went from $3,000/month to under $200.

The Real Numbers

Here's my actual spend over four months:

Current setup: ~$200/month for daily agent use across multiple projects. That includes Opus for architecture, Sonnet for writing, Haiku for everything else.

Could I get it lower? Probably. But $200/month for a working partner that handles coding, writing, research, and project management is genuinely good value. The key is making it $200 instead of $3,000.

Real Talk

The "$100/Month" Myth

Every YouTube video says you can run Claude for $100/month. Let me be honest: if you're actually building daily, that number is fantasy.

Here's why. Every time your agent starts a session, it loads context: your CLAUDE.md, your memory files, your project structure. That's tokens — thousands of them — before you even ask a question. Every. Single. Session.

My real daily cost on Sonnet: $30-50/day. Not because I'm wasteful. Because I'm actually working. Building, debugging, deploying, iterating. Vivienne (who runs on MiniMax M2.5) burns through sessions just by staying informed.

The honest breakdown:

The $100/month tips from YouTube work if you: - Only use your agent a few times a week - Keep conversations short - Don't load heavy context files - Use Haiku for most things

They don't work if you: - Build daily with an agent for hours - Load project context every session - Run multiple projects - Actually ship things

That doesn't mean it's not worth it. A builder spending $600/month on AI that replaces weeks of manual work is getting incredible value. The mistake isn't the spending — it's not knowing what to expect.

Cost-saving tricks that actually work at scale: - Prompt caching — repeated system prompts can save up to 90% (if your provider supports it) - OpenRouter — routes to cheaper providers, can save 50-80% on some models - Claude Max subscription ($100-200/month) — unlimited Sonnet, way cheaper than API if you're a heavy user - Context discipline — keep CLAUDE.md under 200 lines, split reference docs - Session breaks — fresh sessions instead of marathon threads

Be honest with yourself about how you actually use AI. Then budget for that. Not for the YouTube version.

Free and Cheap Options

Not everyone needs API access. If you're starting out, subscriptions are simpler and cheaper.

ChatGPT Plus ($20/month) — includes GPT-4o and Codex tokens. Good starting point.
Claude Pro ($20/month) — includes generous Sonnet usage. Great for writing and reasoning.
Google Gemini — has a free tier. Useful for experimentation.
For coding: Cursor, Windsurf, and Claude Code all have subscription models that include AI usage.

The API is for power users who need programmatic access, custom integrations, or high-volume usage. If you're just starting out, use the subscription first. You can always move to the API later when you understand your usage patterns.

Critical

The One Rule

Never let an agent run unsupervised with your API key and no spending limit.

Set budget caps. Set alerts. Check weekly.

An agent that runs overnight with Opus and a long context can burn $50+ in a single session. An agent with cron jobs and no budget cap can rack up hundreds before you notice.

Your agent isn't trying to waste your money. It just doesn't know what things cost. It will happily use Opus to sort a list of files. It will happily maintain a 200,000-token conversation to answer a yes/no question. It will happily run a heartbeat check 96 times a day finding nothing each time.

You are the budget. Until spending limits are built into every platform (they're not yet), the only thing between your API key and a surprise bill is your own discipline.

Set the cap. Set the alerts. Check the dashboard. Every week.

Continue Learning

Your Agent Has Amnesia

Context windows, memory resets, and handoff systems.

API Keys Are Like Credit Cards

What access to give your agent. What to never give.

The Trust Gap

The first guide written by both sides.

Take the Readiness Quiz