Cost control in production: monitoring and alerting

How to track AI API costs in production, set meaningful alerts, and avoid the surprise bill that ends projects.

The surprise bill ends projects. A production application with no cost monitoring can accumulate thousands of dollars in API costs before anyone notices.

Cost monitoring is not optional. Here is how to do it before you go to production.

Track token usage per request

Every API response includes a usage object with prompt_tokens and completion_tokens. Log both. Not to a console — to a persistent store where you can query them later.

After a week of production traffic, you know your average tokens per request, your most expensive features, and your cost per user.

Set a daily spend alert

Every major API provider supports spend alerts. Set one at 50% of your expected daily budget and another at 90%. The 50% alert is informational. The 90% alert requires action.

The three most common cost surprises

Unbounded loops. An agent that calls the API in a loop without a termination condition. Always set a maximum iteration count.

Logging prompts at full size. In production with high traffic, this generates enormous log volumes.

Forgotten background jobs. A batch job that ran once in development becomes a daily job in production. Verify the expected token count of any scheduled job before enabling it.

Start here: Add token logging to your next API call. Even in development. The habit of knowing what requests cost prevents the surprise later.

Cost control in production: monitoring and alerting

Track token usage per request

Set a daily spend alert

The three most common cost surprises

Related glossary terms