Module 5: Using AI APIs (OpenAI, Gemini, Anthropic)

Module Goal

Move from "talking to ChatGPT in a browser" to "calling LLMs from your own code". By the end you will have written real Python scripts against three frontier APIs, with streaming, JSON, and cost control.

Estimated Duration

4 to 6 hours.

Skills Learned

Python basics for AI (just enough, in 30 minutes)
Installing SDKs and managing API keys safely
Making chat completions across OpenAI, Gemini, and Anthropic
Streaming responses to a user
Structured outputs and tool calls
Handling rate limits and retries
Comparing providers like an engineer

Real-world Importance

Every AI product on the market is built on API calls. Owning this layer means you can build, debug, and switch providers without depending on a framework's leaky abstractions. The examples use OpenAI because the SDK is the most widely documented, but every pattern works on the free-tier providers too: Google AI Studio (Gemini Flash, no card required), Groq (open Llama and Gemma models, no card required), Hugging Face Inference Providers, and Ollama running locally. The full free-stack matrix lives in tech stack and tools.

Lessons in this module

Python for AI in 30 minutes
Setup: keys, env vars, .gitignore, virtual envs
Your first OpenAI call
The same call in Gemini and Claude
Streaming responses
Structured outputs across providers
Rate limits, retries, and resilient code
Cost control patterns

Lesson 5.1: Python for AI in 30 minutes

Hook / Why This Matters

You do not need to be a Python developer to use LLM APIs. You need eight building blocks. This lesson is the shortest possible path to all eight.

Beginner Analogy

Python for AI is like ordering food at a restaurant. You do not need to cook. You need to know the menu, point, and pay.

Concept Explanation

The eight building blocks you need:

Variables and types: x = "hi", n = 5, flag = True
Lists and dicts: [1,2,3], {"name": "Aman"}
f-strings: f"Hello {name}"
Functions: def greet(name): return f"Hi {name}"
Conditionals: if/elif/else
Loops: for x in items: ...
Imports: from openai import OpenAI
Environment variables: os.environ["OPENAI_API_KEY"]

That is it. Everything in this course uses just these.

Technical Breakdown

# 1. Variables
name = "Aman"
age = 23

# 2. Dict (the most important Python structure for AI)
user = {"name": "Aman", "skills": ["python", "react"]}

# 3. f-string
msg = f"Hi {user['name']}, you know {len(user['skills'])} skills."

# 4. Function
def shout(text):
    return text.upper()

# 5. Conditional
if age >= 18:
    print("adult")
else:
    print("minor")

# 6. Loop
for skill in user["skills"]:
    print(skill)

# 7. Import
import os

# 8. Env var
api_key = os.environ.get("OPENAI_API_KEY", "not-set")

Visual Learning Suggestion

A "Python cheat card" infographic with the 8 blocks, each as a small code snippet.

Interactive Element

Open colab.research.google.com. Paste the code above into a cell. Run it. Edit each block. Re-run.

Hands-on Lab

In Colab, write a function summarize_user(user_dict) that takes a dict like {"name": ..., "skills": [...]} and returns a one-sentence f-string. Run it on 3 sample dicts.

Mini Exercise

What is the difference between a list and a dict?

Common Mistakes

Confusing {} (dict) and [] (list)
Forgetting indentation matters in Python
Not using virtual environments

Debugging Tips

If you see KeyError, you accessed a missing dict key. If IndexError, a missing list index. If NameError, a typo or missing import.

Knowledge Check Questions

What is an f-string?
How do you import a library?
How do you read an environment variable safely?

Quiz Questions

user["name"] accesses: a) A list element b) A dict value c) A tuple d) A class attribute Answer: b

Challenge Task

Write a Python script that builds a messages list (as in Module 3) for a 3-turn conversation and prints it formatted.

Real-world Use Cases

Every AI API call uses these building blocks
Data prep for LLM inputs
Quick scripts to test prompts

Industry Insight

If you can write the 8 building blocks fluently, you can ship an AI app. Engineers waste months thinking they need "real Python" first. They do not.

Interview Questions

Show how you would call an API with an API key from an env var.
Explain f-strings.
What does if __name__ == "__main__": mean?

Summary

Eight Python blocks. That is the prerequisite. You are ready.

Lesson 5.2: Setup: keys, env vars, .gitignore, virtual envs

Hook / Why This Matters

The single most expensive beginner mistake in AI is committing an API key to GitHub. Bots scan within minutes. Bills hit within hours. This lesson prevents that.

Beginner Analogy

API keys are house keys. You do not tape them to your front door. You do not paste them in a public chat. You do not push them to GitHub.

Concept Explanation

The safe setup pattern:

Generate an API key from the provider dashboard.
Store it in a local .env file.
Add .env to .gitignore BEFORE the first commit.
Load it via python-dotenv or os.environ.
Rotate keys quarterly.
Use spending limits in the provider dashboard.

Technical Breakdown

.env file (never commit):

OPENAI_API_KEY=sk-...
GEMINI_API_KEY=...
ANTHROPIC_API_KEY=sk-ant-...

.gitignore (commit this):

.env
__pycache__/
*.pyc
.venv/

Loader code:

from dotenv import load_dotenv
import os
load_dotenv()
key = os.environ["OPENAI_API_KEY"]

Virtual env setup:

python -m venv .venv
source .venv/bin/activate    # Mac/Linux
.venv\Scripts\activate       # Windows
pip install openai python-dotenv

Visual Learning Suggestion

A 4-box "safe key flow" diagram: provider dashboard -> .env -> .gitignore protects it -> code loads via env var.

Interactive Element

Create your first OpenAI key at platform.openai.com/api-keys. Set a $5 monthly hard limit. (You will rarely exceed $2 in this course.)

Hands-on Lab

Set up a Python project with .env, .gitignore, and a requirements.txt. Push a clean repo to GitHub. Verify .env is not in the repo.

Mini Exercise

What would you do if you accidentally committed a key?

Common Mistakes

Hard-coding the key string in .py files
Forgetting .gitignore before the first commit
Sharing keys across personal and work projects

Debugging Tips

If you ever accidentally commit a key, rotate it immediately at the provider dashboard. GitHub history is forever. The only fix is invalidate.

Knowledge Check Questions

Why use .env instead of hard-coding?
What is .gitignore?
Why a virtual env?

Quiz Questions

If you commit an API key by accident, the first move is: a) Delete the file b) Rotate the key at the provider c) Add it to .gitignore d) Force push Answer: b

Challenge Task

Add monthly spending limits on all three providers (OpenAI, Google, Anthropic). Screenshot proof for yourself.

Real-world Use Cases

All production AI apps
Personal projects pushed to GitHub
Tutorials and demos

Industry Insight

Reputable companies use secret managers (AWS Secrets Manager, GCP Secret Manager, Doppler) in production. For learning, env vars are enough. Promote later.

Interview Questions

How do you manage API keys safely?
What is a .env file?
What is a virtual environment and why use it?

Summary

Keys in .env, .env in .gitignore, spending caps on, virtual envs for project isolation. This is your baseline forever.

Lesson 5.3: Your first OpenAI call

Hook / Why This Matters

The single most-asked-about line of Python in 2026: how do I actually call ChatGPT from code? This lesson answers it.

Beginner Analogy

You are sending a polite letter to a smart pen friend. You include who you are, what you want, and where to send the reply. The API call is the envelope.

Concept Explanation

Three things every API call needs:

The model id (e.g. gpt-4o-mini)
The messages array (see Module 3.2)
Optional parameters (temperature, max_tokens, response_format)

The provider returns a structured response with the assistant message, token usage, and finish reason.

Technical Breakdown

from openai import OpenAI
from dotenv import load_dotenv
load_dotenv()

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a tutor. Reply in 2 sentences."},
        {"role": "user", "content": "Explain RAG to a beginner."},
    ],
    temperature=0.4,
    max_tokens=200,
)

print(response.choices[0].message.content)
print("Usage:", response.usage.total_tokens)

That is a complete, runnable program.

Visual Learning Suggestion

A request-response diagram: Python script -> HTTP POST with model + messages -> OpenAI -> JSON response -> Python prints.

Interactive Element

Run the above script. Change temperature to 0, 1, and 1.5. Note differences.

Hands-on Lab

Build a ask(question) function that wraps the API call. Use it in a 1-line CLI: python ask.py "What is a transformer?". 20 lines max.

Mini Exercise

How many tokens did your request use? Where is that printed?

Common Mistakes

Using the wrong model id (typos are silent failures)
Forgetting max_tokens and getting truncated responses
Logging the full response object in production (it is big)

Debugging Tips

If you see AuthenticationError, your env var is missing or wrong. If RateLimitError, back off. If BadRequestError, your message format is malformed.

Knowledge Check Questions

What three inputs does every chat completion need?
Where is the assistant's reply in the response object?
Where is the token usage?

Quiz Questions

The assistant's text is at: a) response.text b) response.choices[0].message.content c) response.body d) response.completion Answer: b

Challenge Task

Build a compare_models(question) function that calls 3 different OpenAI models on the same question and prints the answers and token costs side by side.

Real-world Use Cases

Backend AI endpoints
Scripts for data enrichment
Quick experimentation in notebooks

Industry Insight

In 2026 production, almost no one calls the API directly inside request handlers. They wrap it in a service with retries, caching, and logging. Module 5.7 will show this.

Interview Questions

Walk me through a basic OpenAI API call.
How do you read token usage from the response?
What is the difference between gpt-4o and gpt-4o-mini?

Summary

Three lines: import, client, create. You now ship code that calls ChatGPT.

Lesson 5.4: The same call in Gemini and Claude

Hook / Why This Matters

Multi-provider literacy is a 2026 must. Outages, price changes, and capability differences all favor engineers who can swap providers in an hour, not a month.

Beginner Analogy

Three taxi apps in your city. The button is in slightly different places but the trip is the same. You should not become loyal to one because of the button.

Concept Explanation

All three APIs follow the same pattern with slight syntactic differences. The mental model is identical: messages in, response out, usage tracked.

Technical Breakdown

Google Gemini:

import os
from google import genai

client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Explain RAG to a beginner in 2 sentences.",
)
print(response.text)

Anthropic Claude:

import os
from anthropic import Anthropic

client = Anthropic()
message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=200,
    system="You are a tutor. Reply in 2 sentences.",
    messages=[{"role": "user", "content": "Explain RAG to a beginner."}],
)
print(message.content[0].text)

Key differences:

Aspect	OpenAI	Gemini	Anthropic
SDK name	`openai`	`google-genai`	`anthropic`
System message	role in messages	`system_instruction`	top-level `system` param
Response access	`.choices[0].message.content`	`.text`	`.content[0].text`
Streaming	yes	yes	yes
JSON Schema	yes	yes	yes (tools)

Visual Learning Suggestion

A 3-column table comparing the three APIs across 6 dimensions (auth, system msg, streaming, JSON, tools, multimodal).

Interactive Element

Run the same prompt through all three providers. Note differences in tone, length, format.

Hands-on Lab

Write a single Python file with ask_openai(q), ask_gemini(q), ask_claude(q). Run the same prompt through each. Save outputs for comparison.

Mini Exercise

Which provider has the largest free tier as of today? (Check Google AI Studio.)

Common Mistakes

Hard-coding provider-specific behavior throughout an app
Forgetting that Claude requires max_tokens
Comparing free-tier Gemini to paid GPT-4o and concluding quality differences

Debugging Tips

When porting between providers, isolate the call in a thin adapter. Diffs become trivial.

Knowledge Check Questions

Where does Claude take the system instruction?
What is Gemini's free tier good for?
Which response access pattern feels most familiar to you?

Quiz Questions

Anthropic Claude's system instruction is sent as: a) The first message in the array b) A top-level system parameter c) Inside the user message d) Not supported Answer: b

Challenge Task

Build a "provider abstraction" layer with a single chat(provider, messages) function that works across all three.

Real-world Use Cases

Multi-provider fallback for resilience
Price-aware routing
Vendor lock-in avoidance

Industry Insight

The 2026 trend is provider-agnostic apps: route easy queries to cheap providers, hard ones to frontier. Knowing the SDKs natively (not via framework wrappers) makes you the engineer the team trusts.

Interview Questions

Compare the three frontier provider SDKs.
How would you design a provider-agnostic wrapper?
When would you use Gemini over OpenAI?

Summary

Same shape, different buttons. Learn all three. It will take you one weekend and pay back forever.

Lesson 5.5: Streaming responses

Hook / Why This Matters

Why does ChatGPT feel snappy and your beginner script feel slow? Streaming. This lesson upgrades your apps from "AI wait" to "AI flow".

Beginner Analogy

Watching a video buffer vs streaming. Same content, totally different feel. You will never want non-streaming UX after this lesson.

Concept Explanation

Set stream=True. The SDK returns an iterator of token chunks. You print them as they arrive. The user sees instant feedback. Perceived latency drops dramatically even if total time is identical.

Technical Breakdown

OpenAI streaming:

stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Tell me a 3-sentence story."}],
    stream=True,
)
for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)
print()

Gemini and Anthropic have analogous patterns.

Visual Learning Suggestion

A side-by-side animation suggestion: non-streaming (silent for 4 seconds then full text) vs streaming (text appears token by token starting at 0.3 seconds).

Interactive Element

Run the streaming snippet. Now run the non-streaming version. Feel the latency difference.

Hands-on Lab

Convert your ask(question) function from Lesson 5.3 to stream. Print tokens as they arrive.

Mini Exercise

Why does streaming reduce perceived latency but not total tokens or cost?

Common Mistakes

Forgetting flush=True (output buffers awkwardly)
Not handling cancellations (user closes tab, stream keeps running, you pay)
Concatenating chunks into one string without timing it

Debugging Tips

If the stream stalls mid-output, the response was cut by max_tokens or a network hiccup. Catch and resume.

Knowledge Check Questions

What changes in the API call to enable streaming?
Why does streaming feel faster?
How do you reduce wasted cost when the user cancels?

Quiz Questions

The total token cost of a streamed response vs non-streamed is: a) Higher b) Lower c) The same (for the tokens actually generated) d) Unpredictable Answer: c

Challenge Task

Build a tiny terminal chat app that streams. Add Ctrl+C to interrupt.

Real-world Use Cases

All chat UIs
Long-form generation tasks
Live coding assistants

Industry Insight

In 2026 every chat product streams. Non-streaming feels broken. The infra cost is the same. Adopt streaming on day one.

Interview Questions

How does streaming work in the OpenAI SDK?
How would you cancel a stream mid-flight?
How does streaming affect cost?

Summary

Streaming is a free UX win. Always default on for chat.

Lesson 5.6: Structured outputs across providers

Hook / Why This Matters

Module 4.5 taught the why. This lesson covers the how across all three major providers.

Beginner Analogy

Same shipping label format, three different printers. You need to know which buttons make each printer behave.

Concept Explanation

All three providers support enforced JSON outputs:

OpenAI: response_format={"type": "json_schema", "json_schema": {...}}
Gemini: response_mime_type="application/json" + response_schema=
Anthropic: tool use with a tool that takes a typed input matching your schema

Technical Breakdown

Anthropic example (tool-as-schema pattern):

tool = {
    "name": "save_resume",
    "description": "Save extracted resume fields",
    "input_schema": {
        "type": "object",
        "properties": {
            "name": {"type": "string"},
            "skills": {"type": "array", "items": {"type": "string"}},
        },
        "required": ["name", "skills"],
    },
}

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=400,
    tools=[tool],
    tool_choice={"type": "tool", "name": "save_resume"},
    messages=[{"role": "user", "content": "Aman, Python and React."}],
)
print(message.content[0].input)

The model is forced to call save_resume with structured args matching the schema.

Visual Learning Suggestion

3-column comparison: same schema, three SDK calls, identical resulting JSON.

Interactive Element

Use any provider's playground to set a JSON schema and watch it enforce.

Hands-on Lab

Write a extract_resume(text, provider) function that produces the same JSON shape from all three providers.

Mini Exercise

Why does Anthropic use tools to enforce schemas rather than a response_format flag?

Common Mistakes

Forgetting strict: true (OpenAI) or required fields
Letting the model add freestyle commentary alongside JSON
Mixing schemas across versions

Debugging Tips

If you still see invalid JSON, you are either not in structured mode or your schema has loopholes (missing required, extra properties).

Knowledge Check Questions

How does OpenAI enforce JSON schemas?
How does Anthropic achieve the same?
Why are structured outputs important?

Quiz Questions

The most reliable structured output mode in Anthropic is: a) Asking nicely in the prompt b) Tool use with input_schema c) Setting response_format d) Using a custom parser Answer: b

Challenge Task

Build a multi-provider structured extractor that works for both invoices and resumes via the same extract(schema, text) function.

Real-world Use Cases

Resume parsing (the Module 7 project)
Invoice extraction
Form auto-fill from chat
Agent planning that emits structured plans

Industry Insight

Structured outputs collapse the gap between LLMs and traditional systems. Mastering them is the move from "AI demo" to "AI in production".

Interview Questions

Compare structured output approaches across providers.
When would you not use structured outputs?
What happens if your schema is too loose?

Summary

All three providers enforce structured outputs, with slightly different ergonomics. Same schema, three buttons. Master all three.

Lesson 5.7: Rate limits, retries, and resilient code

Hook / Why This Matters

Your script worked five times. The sixth time it hit a rate limit and crashed your app. This lesson keeps your code calm when the API gets noisy.

Beginner Analogy

Even the fanciest restaurant has a limit on how many orders the kitchen takes per minute. Your job is to wait politely, not bang on the counter.

Concept Explanation

Common failure modes:

429 Rate limit: too many requests per minute (RPM) or tokens per minute (TPM)
5xx server errors: transient
400 Bad request: your fault (malformed input)
401 Auth error: bad or expired key
Timeouts: network or model latency spike

The fix for 429 and 5xx: exponential backoff with jitter. Wait 1 second, then 2, then 4, then 8, with a small random offset.

Technical Breakdown

Using tenacity:

from tenacity import retry, wait_random_exponential, stop_after_attempt

@retry(wait=wait_random_exponential(min=1, max=30), stop=stop_after_attempt(5))
def call_llm(messages):
    return client.chat.completions.create(model="gpt-4o-mini", messages=messages)

Five tries, exponentially increasing waits, then surrender.

Visual Learning Suggestion

A graph: x = attempt number, y = wait time, with a jittered exponential curve.

Interactive Element

Trigger your own rate limit by calling the API in a tight loop. Note when 429 shows up. Implement backoff. Re-run. See it recover.

Hands-on Lab

Wrap your ask() function with retries. Add structured logging of failures.

Mini Exercise

Why is jitter important?

Common Mistakes

Retrying on 400s (you will retry forever)
Retrying without backoff (you make it worse)
No max-retry limit

Debugging Tips

If you keep hitting rate limits, the fix is often "smaller model", "batched calls", or "request a higher tier", not "retry harder".

Knowledge Check Questions

Which errors should you retry?
What is exponential backoff?
Why jitter?

Quiz Questions

You should NOT retry: a) 429 rate limit b) 502 bad gateway c) 400 bad request d) 503 service unavailable Answer: c

Challenge Task

Add a "circuit breaker" pattern: after 10 consecutive failures, stop calling for 5 minutes and alert.

Real-world Use Cases

Production AI APIs
Batch processing jobs
Customer-facing chat under load

Industry Insight

The single most common 2026 incident is "we hit rate limits and the whole pipeline died". Resilient retry code is what keeps you off the incident report.

Interview Questions

Implement exponential backoff with jitter.
Which errors should you not retry?
What is a circuit breaker?

Summary

APIs fail. Code defensively: backoff + jitter + max retries + don't retry 4xx. Adopt the pattern once and reuse it everywhere.

Lesson 5.8: Cost control patterns

Hook / Why This Matters

Final lesson of the module. Cost control is where engineers earn or lose their jobs. Master these five patterns and your apps stay cheap.

Beginner Analogy

Five money-saving habits at a restaurant: order what you need, share, take leftovers home, check the bill, set a monthly cap. Same five habits apply to LLM apps.

Concept Explanation

The five cost control patterns:

Right-size the model: smaller model for easy tasks.
Cache repeated calls: identical input -> stored output.
Compress prompts: drop boilerplate, summarize history.
Use prompt caching: providers reuse prefix tokens at lower cost.
Hard caps: monthly spending limits at the provider dashboard.

Technical Breakdown

A simple memory cache:

import hashlib, json

_cache = {}

def cached_ask(messages, model="gpt-4o-mini"):
    key = hashlib.sha256(json.dumps(messages, sort_keys=True).encode()).hexdigest()
    if key in _cache:
        return _cache[key]
    out = call_llm(messages, model=model)
    _cache[key] = out
    return out

In production, replace the dict with Redis or Cloudflare KV.

Visual Learning Suggestion

A waterfall chart: original cost -> after right-sizing -> after caching -> after compression -> after prompt caching, each step dropping the bar.

Interactive Element

Take any prompt you wrote this module. Estimate its monthly cost at 1,000 users x 5 calls/day. Apply each pattern. Re-estimate.

Hands-on Lab

Add caching to your ask() function. Re-run the same prompt twice. Confirm the second call is free.

Mini Exercise

Why does caching not work for chat with temperature > 0?

Common Mistakes

Caching responses that include time-sensitive info
Forgetting that different users may need different responses
Not setting hard spending caps (every horror story starts here)

Debugging Tips

If your bill spikes, the fix is usually one of: model too big, no caching, history grew unboundedly, no spending cap. Check in that order.

Knowledge Check Questions

Name three cost control patterns.
Why is prompt caching useful for shared system prompts?
When does response caching fail?

Quiz Questions

The fastest cost saving in most apps is: a) Disable streaming b) Switch to a smaller model for non-critical paths c) Use temperature 0 d) Compress JSON Answer: b

Challenge Task

Audit one of your own AI scripts. Apply the five patterns. Report the projected cost drop.

Real-world Use Cases

Cost-aware production apps
Internal AI tools with strict budgets
Free-tier consumer apps

Industry Insight

The engineer who can cut AI costs 80% without quality loss is the most valuable person on any AI team. This skill compounds quarterly as your usage scales.

Interview Questions

What patterns do you use to control LLM cost?
How do you decide between two model sizes?
When would you cache LLM responses?

Summary

Right-size, cache, compress, prompt-cache, cap. Five habits. Apply forever.

Module 5 Recap

You can call the OpenAI, Gemini, and Anthropic APIs, stream, get structured outputs, handle errors, and control costs. You are no longer dependent on ChatGPT's web UI. You can build.

Next Module

Module 6: Building Your First AI Chat App