GeekHub Learn
Module
Lesson 5.77 of 8 in this module2 min read Module 5: Using AI APIs (OpenAI, Gemini, Anthropic)

Rate limits, retries, and resilient code

Your script worked five times. The sixth time it hit a rate limit and crashed your app. This lesson keeps your code calm when the API gets noisy.

Even the fanciest restaurant has a limit on how many orders the kitchen takes per minute. Your job is to wait politely, not bang on the counter.

Common failure modes:

  • 429 Rate limit: too many requests per minute (RPM) or tokens per minute (TPM)
  • 5xx server errors: transient
  • 400 Bad request: your fault (malformed input)
  • 401 Auth error: bad or expired key
  • Timeouts: network or model latency spike

The fix for 429 and 5xx: exponential backoff with jitter. Wait 1 second, then 2, then 4, then 8, with a small random offset.

Using tenacity:

from tenacity import retry, wait_random_exponential, stop_after_attempt

@retry(wait=wait_random_exponential(min=1, max=30), stop=stop_after_attempt(5))
def call_llm(messages):
    return client.chat.completions.create(model="gpt-4o-mini", messages=messages)

Five tries, exponentially increasing waits, then surrender.

Visualize it

A graph: x = attempt number, y = wait time, with a jittered exponential curve.

Try it now

Trigger your own rate limit by calling the API in a tight loop. Note when 429 shows up. Implement backoff. Re-run. See it recover.

Hands-on lab

Wrap your ask() function with retries. Add structured logging of failures.

Try it now

Why is jitter important?

Common mistakes

  • Retrying on 400s (you will retry forever)
  • Retrying without backoff (you make it worse)
  • No max-retry limit

Debugging tip

If you keep hitting rate limits, the fix is often "smaller model", "batched calls", or "request a higher tier", not "retry harder".

Challenge

Add a "circuit breaker" pattern: after 10 consecutive failures, stop calling for 5 minutes and alert.

Where this shows up

  • Production AI APIs
  • Batch processing jobs
  • Customer-facing chat under load

From the field

The single most common 2026 incident is "we hit rate limits and the whole pipeline died". Resilient retry code is what keeps you off the incident report.

Recap

APIs fail. Code defensively: backoff + jitter + max retries + don't retry 4xx. Adopt the pattern once and reuse it everywhere.


Quick recall

3 prompts · think before you flip

Prompt 1 of 3

Which errors should you retry?

Quiz time

1 question · tap an answer to check it

  1. 1. You should NOT retry

Finished lesson 5.7?

Mark complete to update your module progress and unlock the streak.

Loading