Module 5: Using AI APIs (OpenAI, Gemini, Anthropic)
Module Goal
Move from "talking to ChatGPT in a browser" to "calling LLMs from your own code". By the end you will have written real Python scripts against three frontier APIs, with streaming, JSON, and cost control.
Estimated Duration
4 to 6 hours.
Skills Learned
- Python basics for AI (just enough, in 30 minutes)
- Installing SDKs and managing API keys safely
- Making chat completions across OpenAI, Gemini, and Anthropic
- Streaming responses to a user
- Structured outputs and tool calls
- Handling rate limits and retries
- Comparing providers like an engineer
Real-world Importance
Every AI product on the market is built on API calls. Owning this layer means you can build, debug, and switch providers without depending on a framework's leaky abstractions.
Lessons in this module
- Python for AI in 30 minutes
- Setup: keys, env vars, .gitignore, virtual envs
- Your first OpenAI call
- The same call in Gemini and Claude
- Streaming responses
- Structured outputs across providers
- Rate limits, retries, and resilient code
- Cost control patterns
Lesson 5.1: Python for AI in 30 minutes
Hook / Why This Matters
You do not need to be a Python developer to use LLM APIs. You need eight building blocks. This lesson is the shortest possible path to all eight.
Beginner Analogy
Python for AI is like ordering food at a restaurant. You do not need to cook. You need to know the menu, point, and pay.
Concept Explanation
The eight building blocks you need:
- Variables and types:
x = "hi",n = 5,flag = True - Lists and dicts:
[1,2,3],{"name": "Aman"} - f-strings:
f"Hello {name}" - Functions:
def greet(name): return f"Hi {name}" - Conditionals:
if/elif/else - Loops:
for x in items: ... - Imports:
from openai import OpenAI - Environment variables:
os.environ["OPENAI_API_KEY"]
That is it. Everything in this course uses just these.
Technical Breakdown
# 1. Variables
name = "Aman"
age = 23
# 2. Dict (the most important Python structure for AI)
user = {"name": "Aman", "skills": ["python", "react"]}
# 3. f-string
msg = f"Hi {user['name']}, you know {len(user['skills'])} skills."
# 4. Function
def shout(text):
return text.upper()
# 5. Conditional
if age >= 18:
print("adult")
else:
print("minor")
# 6. Loop
for skill in user["skills"]:
print(skill)
# 7. Import
import os
# 8. Env var
api_key = os.environ.get("OPENAI_API_KEY", "not-set")
Visual Learning Suggestion
A "Python cheat card" infographic with the 8 blocks, each as a small code snippet.
Interactive Element
Open colab.research.google.com. Paste the code above into a cell. Run it. Edit each block. Re-run.
Hands-on Lab
In Colab, write a function summarize_user(user_dict) that takes a dict like {"name": ..., "skills": [...]} and returns a one-sentence f-string. Run it on 3 sample dicts.
Mini Exercise
What is the difference between a list and a dict?
Common Mistakes
- Confusing
{}(dict) and[](list) - Forgetting indentation matters in Python
- Not using virtual environments
Debugging Tips
If you see KeyError, you accessed a missing dict key. If IndexError, a missing list index. If NameError, a typo or missing import.
Knowledge Check Questions
- What is an f-string?
- How do you import a library?
- How do you read an environment variable safely?
Quiz Questions
user["name"]accesses: a) A list element b) A dict value c) A tuple d) A class attribute Answer: b
Challenge Task
Write a Python script that builds a messages list (as in Module 3) for a 3-turn conversation and prints it formatted.
Real-world Use Cases
- Every AI API call uses these building blocks
- Data prep for LLM inputs
- Quick scripts to test prompts
Industry Insight
If you can write the 8 building blocks fluently, you can ship an AI app. Engineers waste months thinking they need "real Python" first. They do not.
Interview Questions
- Show how you would call an API with an API key from an env var.
- Explain f-strings.
- What does
if __name__ == "__main__":mean?
Summary
Eight Python blocks. That is the prerequisite. You are ready.
Lesson 5.2: Setup: keys, env vars, .gitignore, virtual envs
Hook / Why This Matters
The single most expensive beginner mistake in AI is committing an API key to GitHub. Bots scan within minutes. Bills hit within hours. This lesson prevents that.
Beginner Analogy
API keys are house keys. You do not tape them to your front door. You do not paste them in a public chat. You do not push them to GitHub.
Concept Explanation
The safe setup pattern:
- Generate an API key from the provider dashboard.
- Store it in a local
.envfile. - Add
.envto.gitignoreBEFORE the first commit. - Load it via
python-dotenvoros.environ. - Rotate keys quarterly.
- Use spending limits in the provider dashboard.
Technical Breakdown
.env file (never commit):
OPENAI_API_KEY=sk-...
GEMINI_API_KEY=...
ANTHROPIC_API_KEY=sk-ant-...
.gitignore (commit this):
.env
__pycache__/
*.pyc
.venv/
Loader code:
from dotenv import load_dotenv
import os
load_dotenv()
key = os.environ["OPENAI_API_KEY"]
Virtual env setup:
python -m venv .venv
source .venv/bin/activate # Mac/Linux
.venv\Scripts\activate # Windows
pip install openai python-dotenv
Visual Learning Suggestion
A 4-box "safe key flow" diagram: provider dashboard -> .env -> .gitignore protects it -> code loads via env var.
Interactive Element
Create your first OpenAI key at platform.openai.com/api-keys. Set a $5 monthly hard limit. (You will rarely exceed $2 in this course.)
Hands-on Lab
Set up a Python project with .env, .gitignore, and a requirements.txt. Push a clean repo to GitHub. Verify .env is not in the repo.
Mini Exercise
What would you do if you accidentally committed a key?
Common Mistakes
- Hard-coding the key string in
.pyfiles - Forgetting
.gitignorebefore the first commit - Sharing keys across personal and work projects
Debugging Tips
If you ever accidentally commit a key, rotate it immediately at the provider dashboard. GitHub history is forever. The only fix is invalidate.
Knowledge Check Questions
- Why use
.envinstead of hard-coding? - What is
.gitignore? - Why a virtual env?
Quiz Questions
- If you commit an API key by accident, the first move is: a) Delete the file b) Rotate the key at the provider c) Add it to .gitignore d) Force push Answer: b
Challenge Task
Add monthly spending limits on all three providers (OpenAI, Google, Anthropic). Screenshot proof for yourself.
Real-world Use Cases
- All production AI apps
- Personal projects pushed to GitHub
- Tutorials and demos
Industry Insight
Reputable companies use secret managers (AWS Secrets Manager, GCP Secret Manager, Doppler) in production. For learning, env vars are enough. Promote later.
Interview Questions
- How do you manage API keys safely?
- What is a
.envfile? - What is a virtual environment and why use it?
Summary
Keys in .env, .env in .gitignore, spending caps on, virtual envs for project isolation. This is your baseline forever.
Lesson 5.3: Your first OpenAI call
Hook / Why This Matters
The single most-asked-about line of Python in 2026: how do I actually call ChatGPT from code? This lesson answers it.
Beginner Analogy
You are sending a polite letter to a smart pen friend. You include who you are, what you want, and where to send the reply. The API call is the envelope.
Concept Explanation
Three things every API call needs:
- The model id (e.g.
gpt-4o-mini) - The messages array (see Module 3.2)
- Optional parameters (
temperature,max_tokens,response_format)
The provider returns a structured response with the assistant message, token usage, and finish reason.
Technical Breakdown
from openai import OpenAI
from dotenv import load_dotenv
load_dotenv()
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a tutor. Reply in 2 sentences."},
{"role": "user", "content": "Explain RAG to a beginner."},
],
temperature=0.4,
max_tokens=200,
)
print(response.choices[0].message.content)
print("Usage:", response.usage.total_tokens)
That is a complete, runnable program.
Visual Learning Suggestion
A request-response diagram: Python script -> HTTP POST with model + messages -> OpenAI -> JSON response -> Python prints.
Interactive Element
Run the above script. Change temperature to 0, 1, and 1.5. Note differences.
Hands-on Lab
Build a ask(question) function that wraps the API call. Use it in a 1-line CLI: python ask.py "What is a transformer?". 20 lines max.
Mini Exercise
How many tokens did your request use? Where is that printed?
Common Mistakes
- Using the wrong model id (typos are silent failures)
- Forgetting
max_tokensand getting truncated responses - Logging the full response object in production (it is big)
Debugging Tips
If you see AuthenticationError, your env var is missing or wrong. If RateLimitError, back off. If BadRequestError, your message format is malformed.
Knowledge Check Questions
- What three inputs does every chat completion need?
- Where is the assistant's reply in the response object?
- Where is the token usage?
Quiz Questions
- The assistant's text is at:
a)
response.textb)response.choices[0].message.contentc)response.bodyd)response.completionAnswer: b
Challenge Task
Build a compare_models(question) function that calls 3 different OpenAI models on the same question and prints the answers and token costs side by side.
Real-world Use Cases
- Backend AI endpoints
- Scripts for data enrichment
- Quick experimentation in notebooks
Industry Insight
In 2026 production, almost no one calls the API directly inside request handlers. They wrap it in a service with retries, caching, and logging. Module 5.7 will show this.
Interview Questions
- Walk me through a basic OpenAI API call.
- How do you read token usage from the response?
- What is the difference between
gpt-4oandgpt-4o-mini?
Summary
Three lines: import, client, create. You now ship code that calls ChatGPT.
Lesson 5.4: The same call in Gemini and Claude
Hook / Why This Matters
Multi-provider literacy is a 2026 must. Outages, price changes, and capability differences all favor engineers who can swap providers in an hour, not a month.
Beginner Analogy
Three taxi apps in your city. The button is in slightly different places but the trip is the same. You should not become loyal to one because of the button.
Concept Explanation
All three APIs follow the same pattern with slight syntactic differences. The mental model is identical: messages in, response out, usage tracked.
Technical Breakdown
Google Gemini:
import os
from google import genai
client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])
response = client.models.generate_content(
model="gemini-2.5-flash",
contents="Explain RAG to a beginner in 2 sentences.",
)
print(response.text)
Anthropic Claude:
import os
from anthropic import Anthropic
client = Anthropic()
message = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=200,
system="You are a tutor. Reply in 2 sentences.",
messages=[{"role": "user", "content": "Explain RAG to a beginner."}],
)
print(message.content[0].text)
Key differences:
| Aspect | OpenAI | Gemini | Anthropic |
|---|---|---|---|
| SDK name | openai | google-genai | anthropic |
| System message | role in messages | system_instruction | top-level system param |
| Response access | .choices[0].message.content | .text | .content[0].text |
| Streaming | yes | yes | yes |
| JSON Schema | yes | yes | yes (tools) |
Visual Learning Suggestion
A 3-column table comparing the three APIs across 6 dimensions (auth, system msg, streaming, JSON, tools, multimodal).
Interactive Element
Run the same prompt through all three providers. Note differences in tone, length, format.
Hands-on Lab
Write a single Python file with ask_openai(q), ask_gemini(q), ask_claude(q). Run the same prompt through each. Save outputs for comparison.
Mini Exercise
Which provider has the largest free tier as of today? (Check Google AI Studio.)
Common Mistakes
- Hard-coding provider-specific behavior throughout an app
- Forgetting that Claude requires
max_tokens - Comparing free-tier Gemini to paid GPT-4o and concluding quality differences
Debugging Tips
When porting between providers, isolate the call in a thin adapter. Diffs become trivial.
Knowledge Check Questions
- Where does Claude take the system instruction?
- What is Gemini's free tier good for?
- Which response access pattern feels most familiar to you?
Quiz Questions
- Anthropic Claude's system instruction is sent as:
a) The first message in the array
b) A top-level
systemparameter c) Inside the user message d) Not supported Answer: b
Challenge Task
Build a "provider abstraction" layer with a single chat(provider, messages) function that works across all three.
Real-world Use Cases
- Multi-provider fallback for resilience
- Price-aware routing
- Vendor lock-in avoidance
Industry Insight
The 2026 trend is provider-agnostic apps: route easy queries to cheap providers, hard ones to frontier. Knowing the SDKs natively (not via framework wrappers) makes you the engineer the team trusts.
Interview Questions
- Compare the three frontier provider SDKs.
- How would you design a provider-agnostic wrapper?
- When would you use Gemini over OpenAI?
Summary
Same shape, different buttons. Learn all three. It will take you one weekend and pay back forever.
Lesson 5.5: Streaming responses
Hook / Why This Matters
Why does ChatGPT feel snappy and your beginner script feel slow? Streaming. This lesson upgrades your apps from "AI wait" to "AI flow".
Beginner Analogy
Watching a video buffer vs streaming. Same content, totally different feel. You will never want non-streaming UX after this lesson.
Concept Explanation
Set stream=True. The SDK returns an iterator of token chunks. You print them as they arrive. The user sees instant feedback. Perceived latency drops dramatically even if total time is identical.
Technical Breakdown
OpenAI streaming:
stream = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Tell me a 3-sentence story."}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True)
print()
Gemini and Anthropic have analogous patterns.
Visual Learning Suggestion
A side-by-side animation suggestion: non-streaming (silent for 4 seconds then full text) vs streaming (text appears token by token starting at 0.3 seconds).
Interactive Element
Run the streaming snippet. Now run the non-streaming version. Feel the latency difference.
Hands-on Lab
Convert your ask(question) function from Lesson 5.3 to stream. Print tokens as they arrive.
Mini Exercise
Why does streaming reduce perceived latency but not total tokens or cost?
Common Mistakes
- Forgetting
flush=True(output buffers awkwardly) - Not handling cancellations (user closes tab, stream keeps running, you pay)
- Concatenating chunks into one string without timing it
Debugging Tips
If the stream stalls mid-output, the response was cut by max_tokens or a network hiccup. Catch and resume.
Knowledge Check Questions
- What changes in the API call to enable streaming?
- Why does streaming feel faster?
- How do you reduce wasted cost when the user cancels?
Quiz Questions
- The total token cost of a streamed response vs non-streamed is: a) Higher b) Lower c) The same (for the tokens actually generated) d) Unpredictable Answer: c
Challenge Task
Build a tiny terminal chat app that streams. Add Ctrl+C to interrupt.
Real-world Use Cases
- All chat UIs
- Long-form generation tasks
- Live coding assistants
Industry Insight
In 2026 every chat product streams. Non-streaming feels broken. The infra cost is the same. Adopt streaming on day one.
Interview Questions
- How does streaming work in the OpenAI SDK?
- How would you cancel a stream mid-flight?
- How does streaming affect cost?
Summary
Streaming is a free UX win. Always default on for chat.
Lesson 5.6: Structured outputs across providers
Hook / Why This Matters
Module 4.5 taught the why. This lesson covers the how across all three major providers.
Beginner Analogy
Same shipping label format, three different printers. You need to know which buttons make each printer behave.
Concept Explanation
All three providers support enforced JSON outputs:
- OpenAI:
response_format={"type": "json_schema", "json_schema": {...}} - Gemini:
response_mime_type="application/json"+response_schema= - Anthropic: tool use with a tool that takes a typed input matching your schema
Technical Breakdown
Anthropic example (tool-as-schema pattern):
tool = {
"name": "save_resume",
"description": "Save extracted resume fields",
"input_schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"skills": {"type": "array", "items": {"type": "string"}},
},
"required": ["name", "skills"],
},
}
message = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=400,
tools=[tool],
tool_choice={"type": "tool", "name": "save_resume"},
messages=[{"role": "user", "content": "Aman, Python and React."}],
)
print(message.content[0].input)
The model is forced to call save_resume with structured args matching the schema.
Visual Learning Suggestion
3-column comparison: same schema, three SDK calls, identical resulting JSON.
Interactive Element
Use any provider's playground to set a JSON schema and watch it enforce.
Hands-on Lab
Write a extract_resume(text, provider) function that produces the same JSON shape from all three providers.
Mini Exercise
Why does Anthropic use tools to enforce schemas rather than a response_format flag?
Common Mistakes
- Forgetting
strict: true(OpenAI) orrequiredfields - Letting the model add freestyle commentary alongside JSON
- Mixing schemas across versions
Debugging Tips
If you still see invalid JSON, you are either not in structured mode or your schema has loopholes (missing required, extra properties).
Knowledge Check Questions
- How does OpenAI enforce JSON schemas?
- How does Anthropic achieve the same?
- Why are structured outputs important?
Quiz Questions
- The most reliable structured output mode in Anthropic is: a) Asking nicely in the prompt b) Tool use with input_schema c) Setting response_format d) Using a custom parser Answer: b
Challenge Task
Build a multi-provider structured extractor that works for both invoices and resumes via the same extract(schema, text) function.
Real-world Use Cases
- Resume parsing (the Module 7 project)
- Invoice extraction
- Form auto-fill from chat
- Agent planning that emits structured plans
Industry Insight
Structured outputs collapse the gap between LLMs and traditional systems. Mastering them is the move from "AI demo" to "AI in production".
Interview Questions
- Compare structured output approaches across providers.
- When would you not use structured outputs?
- What happens if your schema is too loose?
Summary
All three providers enforce structured outputs, with slightly different ergonomics. Same schema, three buttons. Master all three.
Lesson 5.7: Rate limits, retries, and resilient code
Hook / Why This Matters
Your script worked five times. The sixth time it hit a rate limit and crashed your app. This lesson keeps your code calm when the API gets noisy.
Beginner Analogy
Even the fanciest restaurant has a limit on how many orders the kitchen takes per minute. Your job is to wait politely, not bang on the counter.
Concept Explanation
Common failure modes:
- 429 Rate limit: too many requests per minute (RPM) or tokens per minute (TPM)
- 5xx server errors: transient
- 400 Bad request: your fault (malformed input)
- 401 Auth error: bad or expired key
- Timeouts: network or model latency spike
The fix for 429 and 5xx: exponential backoff with jitter. Wait 1 second, then 2, then 4, then 8, with a small random offset.
Technical Breakdown
Using tenacity:
from tenacity import retry, wait_random_exponential, stop_after_attempt
@retry(wait=wait_random_exponential(min=1, max=30), stop=stop_after_attempt(5))
def call_llm(messages):
return client.chat.completions.create(model="gpt-4o-mini", messages=messages)
Five tries, exponentially increasing waits, then surrender.
Visual Learning Suggestion
A graph: x = attempt number, y = wait time, with a jittered exponential curve.
Interactive Element
Trigger your own rate limit by calling the API in a tight loop. Note when 429 shows up. Implement backoff. Re-run. See it recover.
Hands-on Lab
Wrap your ask() function with retries. Add structured logging of failures.
Mini Exercise
Why is jitter important?
Common Mistakes
- Retrying on 400s (you will retry forever)
- Retrying without backoff (you make it worse)
- No max-retry limit
Debugging Tips
If you keep hitting rate limits, the fix is often "smaller model", "batched calls", or "request a higher tier", not "retry harder".
Knowledge Check Questions
- Which errors should you retry?
- What is exponential backoff?
- Why jitter?
Quiz Questions
- You should NOT retry: a) 429 rate limit b) 502 bad gateway c) 400 bad request d) 503 service unavailable Answer: c
Challenge Task
Add a "circuit breaker" pattern: after 10 consecutive failures, stop calling for 5 minutes and alert.
Real-world Use Cases
- Production AI APIs
- Batch processing jobs
- Customer-facing chat under load
Industry Insight
The single most common 2026 incident is "we hit rate limits and the whole pipeline died". Resilient retry code is what keeps you off the incident report.
Interview Questions
- Implement exponential backoff with jitter.
- Which errors should you not retry?
- What is a circuit breaker?
Summary
APIs fail. Code defensively: backoff + jitter + max retries + don't retry 4xx. Adopt the pattern once and reuse it everywhere.
Lesson 5.8: Cost control patterns
Hook / Why This Matters
Final lesson of the module. Cost control is where engineers earn or lose their jobs. Master these five patterns and your apps stay cheap.
Beginner Analogy
Five money-saving habits at a restaurant: order what you need, share, take leftovers home, check the bill, set a monthly cap. Same five habits apply to LLM apps.
Concept Explanation
The five cost control patterns:
- Right-size the model: smaller model for easy tasks.
- Cache repeated calls: identical input -> stored output.
- Compress prompts: drop boilerplate, summarize history.
- Use prompt caching: providers reuse prefix tokens at lower cost.
- Hard caps: monthly spending limits at the provider dashboard.
Technical Breakdown
A simple memory cache:
import hashlib, json
_cache = {}
def cached_ask(messages, model="gpt-4o-mini"):
key = hashlib.sha256(json.dumps(messages, sort_keys=True).encode()).hexdigest()
if key in _cache:
return _cache[key]
out = call_llm(messages, model=model)
_cache[key] = out
return out
In production, replace the dict with Redis or Cloudflare KV.
Visual Learning Suggestion
A waterfall chart: original cost -> after right-sizing -> after caching -> after compression -> after prompt caching, each step dropping the bar.
Interactive Element
Take any prompt you wrote this module. Estimate its monthly cost at 1,000 users x 5 calls/day. Apply each pattern. Re-estimate.
Hands-on Lab
Add caching to your ask() function. Re-run the same prompt twice. Confirm the second call is free.
Mini Exercise
Why does caching not work for chat with temperature > 0?
Common Mistakes
- Caching responses that include time-sensitive info
- Forgetting that different users may need different responses
- Not setting hard spending caps (every horror story starts here)
Debugging Tips
If your bill spikes, the fix is usually one of: model too big, no caching, history grew unboundedly, no spending cap. Check in that order.
Knowledge Check Questions
- Name three cost control patterns.
- Why is prompt caching useful for shared system prompts?
- When does response caching fail?
Quiz Questions
- The fastest cost saving in most apps is: a) Disable streaming b) Switch to a smaller model for non-critical paths c) Use temperature 0 d) Compress JSON Answer: b
Challenge Task
Audit one of your own AI scripts. Apply the five patterns. Report the projected cost drop.
Real-world Use Cases
- Cost-aware production apps
- Internal AI tools with strict budgets
- Free-tier consumer apps
Industry Insight
The engineer who can cut AI costs 80% without quality loss is the most valuable person on any AI team. This skill compounds quarterly as your usage scales.
Interview Questions
- What patterns do you use to control LLM cost?
- How do you decide between two model sizes?
- When would you cache LLM responses?
Summary
Right-size, cache, compress, prompt-cache, cap. Five habits. Apply forever.
Module 5 Recap
You can call the OpenAI, Gemini, and Anthropic APIs, stream, get structured outputs, handle errors, and control costs. You are no longer dependent on ChatGPT's web UI. You can build.
SEO Notes
- Primary keyword: "OpenAI API tutorial for beginners"
- Long-tail targets: "Gemini API Python", "Anthropic Claude API beginner", "stream OpenAI Python", "LLM cost control"
- Schema: HowTo for each lab
- Internal links: Module 4 (prompts), Module 6 (apps), Module 10 (deploy)