GeekHub Learn
Module
Lesson 5.55 of 8 in this module2 min read Module 5: Using AI APIs (OpenAI, Gemini, Anthropic)

Streaming responses

Why does ChatGPT feel snappy and your beginner script feel slow? Streaming. This lesson upgrades your apps from "AI wait" to "AI flow".

Watching a video buffer vs streaming. Same content, totally different feel. You will never want non-streaming UX after this lesson.

Set stream=True. The SDK returns an iterator of token chunks. You print them as they arrive. The user sees instant feedback. Perceived latency drops dramatically even if total time is identical.

OpenAI streaming:

stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Tell me a 3-sentence story."}],
    stream=True,
)
for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)
print()

Gemini and Anthropic have analogous patterns.

Visualize it

A side-by-side animation suggestion: non-streaming (silent for 4 seconds then full text) vs streaming (text appears token by token starting at 0.3 seconds).

Try it now

Run the streaming snippet. Now run the non-streaming version. Feel the latency difference.

Hands-on lab

Convert your ask(question) function from Lesson 5.3 to stream. Print tokens as they arrive.

Try it now

Why does streaming reduce perceived latency but not total tokens or cost?

Common mistakes

  • Forgetting flush=True (output buffers awkwardly)
  • Not handling cancellations (user closes tab, stream keeps running, you pay)
  • Concatenating chunks into one string without timing it

Debugging tip

If the stream stalls mid-output, the response was cut by max_tokens or a network hiccup. Catch and resume.

Challenge

Build a tiny terminal chat app that streams. Add Ctrl+C to interrupt.

Where this shows up

  • All chat UIs
  • Long-form generation tasks
  • Live coding assistants

From the field

In 2026 every chat product streams. Non-streaming feels broken. The infra cost is the same. Adopt streaming on day one.

Recap

Streaming is a free UX win. Always default on for chat.


Quick recall

3 prompts · think before you flip

Prompt 1 of 3

What changes in the API call to enable streaming?

Quiz time

1 question · tap an answer to check it

  1. 1. The total token cost of a streamed response vs non-streamed is

Finished lesson 5.5?

Mark complete to update your module progress and unlock the streak.

Loading