Before you write application code, make one raw API call. Not through a library. Not through a wrapper. Directly to the endpoint, with curl or a simple fetch. See what comes back.
This sounds elementary. It is not. Understanding the raw response prevents a category of bugs that trip up most developers when they start building on top of abstractions.
What the request looks like
Every major model API accepts the same basic structure: a model name, a messages array with system and user roles, and a max_tokens limit. The messages array contains the conversation. System is the standing instruction. User is the current input.
What the response looks like
Three things to note: the content is nested under choices[0].message.content — not at the top level. The finish_reason tells you why the model stopped. The usage object tells you exactly what you were charged for.
The three things to set up first
Rate limit handling. Every API has rate limits. Handle the 429 response before you build anything else. A simple exponential backoff prevents most rate limit issues in production.
Error handling. APIs fail. Networks fail. Assume every API call can fail and handle it explicitly.
Token counting. Know how many tokens your requests consume before you deploy. Most providers offer a tokenizer — use it to estimate costs.
Start here: Make a single API call with curl to your chosen model. Read the raw response. Find the content, the finish_reason, and the token counts.