Rate limits, retries, and resilient code
Your script worked five times. The sixth time it hit a rate limit and crashed your app. This lesson keeps your code calm when the API gets noisy.
Even the fanciest restaurant has a limit on how many orders the kitchen takes per minute. Your job is to wait politely, not bang on the counter.
Common failure modes:
- 429 Rate limit: too many requests per minute (RPM) or tokens per minute (TPM)
- 5xx server errors: transient
- 400 Bad request: your fault (malformed input)
- 401 Auth error: bad or expired key
- Timeouts: network or model latency spike
The fix for 429 and 5xx: exponential backoff with jitter. Wait 1 second, then 2, then 4, then 8, with a small random offset.
Using tenacity:
from tenacity import retry, wait_random_exponential, stop_after_attempt
@retry(wait=wait_random_exponential(min=1, max=30), stop=stop_after_attempt(5))
def call_llm(messages):
return client.chat.completions.create(model="gpt-4o-mini", messages=messages)
Five tries, exponentially increasing waits, then surrender.
Quick recall
3 prompts · think before you flip
Prompt 1 of 3
Which errors should you retry?
Quiz time
1 question · tap an answer to check it
1. You should NOT retry
Finished lesson 5.7?
Mark complete to update your module progress and unlock the streak.