GeekHub Learn
Module

From prompt to answer: the 6-step journey

When you press Enter on ChatGPT, six precise things happen. Every later module hooks into one of those steps. This is your skeleton.

Imagine a translator at the UN. They hear your words, mentally turn each into a concept, listen to the rest of the sentence to decide what each concept emphasizes, then say one word at a time in the target language, looping back to keep what they have already said in mind. Transformers do exactly that, but with numbers.

The six steps:

  1. Tokenization: your text is chopped into small chunks called tokens.
  2. Embedding: each token is turned into a vector of numbers.
  3. Attention: each token "looks at" the others to decide which matter for its meaning in this context.
  4. Stacked layers: this attention process repeats across many layers, refining understanding.
  5. Next-token prediction: the model outputs probabilities over every possible next token.
  6. Sampling and loop: one token is picked (by probability), appended to the input, and the whole loop runs again until a stop signal.

That is ChatGPT. Everything else (RAG, fine-tuning, agents) is wrapping or augmenting these six steps.

The model is a stack of identical layers (often 32 to 96 of them in modern LLMs). Each layer has two sub-layers: self-attention and a feed-forward network. The output of each layer is the input to the next. The final layer projects back to vocabulary size and a softmax produces probabilities.

Visualize it

A horizontal pipeline diagram with 6 boxes labeled with the 6 steps. Arrows between. Loop arrow from step 6 back to step 1 showing autoregressive generation.

Try it now

Ask ChatGPT: "What is your next token going to be after 'The capital of France is'? Show the top 5 candidates if you can." Notice it strongly prefers "Paris" but is technically computing a distribution.

Hands-on lab

Open https://platform.openai.com/tokenizer. Paste any 50-word paragraph. Observe how it gets split. Note how spaces, punctuation, and uncommon words get split differently. Tokens are not words.

Try it now

Count: how many tokens is the sentence "Hello, world! This is GeekHub."? Use the tokenizer tool.

Common mistakes

  • Thinking the model "looks up" answers. It does not. It computes them.
  • Thinking the model writes whole sentences at once. It writes one token at a time.
  • Confusing tokens with words. They are not the same.

Debugging tip

If your app behaves oddly at high token counts, check whether you are bumping against the context window. The loop in step 6 cannot remember anything outside the current context.

Challenge

Draw the 6-step diagram from memory on paper. Photograph it. Post it on GeekHub with #ai-beginners.

Where this shows up

  • Streaming chat UIs: the autoregressive loop is what lets you watch the response appear word by word.
  • Token-level cost analysis: knowing tokens are emitted one at a time lets you predict pricing.
  • Function/tool calling: the model emits a structured token sequence the runtime intercepts.

From the field

Knowing this loop is what lets you write production code that calls an LLM with stream=True. It is also why you can cancel a generation mid-flight, saving money on long outputs.

Recap

ChatGPT is a six-step pipeline: tokenize, embed, attend, layer, predict next token, sample and loop. Everything else builds on this.


Quick recall

3 prompts · think before you flip

Prompt 1 of 3

Recite the six steps without looking.

Quiz time

1 question · tap an answer to check it

  1. 1. Generation stops when

Finished lesson 2.1?

Mark complete to update your module progress and unlock the streak.

Loading