Streaming responses
Why does ChatGPT feel snappy and your beginner script feel slow? Streaming. This lesson upgrades your apps from "AI wait" to "AI flow".
Watching a video buffer vs streaming. Same content, totally different feel. You will never want non-streaming UX after this lesson.
Set stream=True. The SDK returns an iterator of token chunks. You print them as they arrive. The user sees instant feedback. Perceived latency drops dramatically even if total time is identical.
OpenAI streaming:
stream = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Tell me a 3-sentence story."}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True)
print()
Gemini and Anthropic have analogous patterns.
Quick recall
3 prompts · think before you flip
Prompt 1 of 3
What changes in the API call to enable streaming?
Quiz time
1 question · tap an answer to check it
1. The total token cost of a streamed response vs non-streamed is
Finished lesson 5.5?
Mark complete to update your module progress and unlock the streak.