From prompt to answer: the 6-step journey
When you press Enter on ChatGPT, six precise things happen. Every later module hooks into one of those steps. This is your skeleton.
Imagine a translator at the UN. They hear your words, mentally turn each into a concept, listen to the rest of the sentence to decide what each concept emphasizes, then say one word at a time in the target language, looping back to keep what they have already said in mind. Transformers do exactly that, but with numbers.
The six steps:
- Tokenization: your text is chopped into small chunks called tokens.
- Embedding: each token is turned into a vector of numbers.
- Attention: each token "looks at" the others to decide which matter for its meaning in this context.
- Stacked layers: this attention process repeats across many layers, refining understanding.
- Next-token prediction: the model outputs probabilities over every possible next token.
- Sampling and loop: one token is picked (by probability), appended to the input, and the whole loop runs again until a stop signal.
That is ChatGPT. Everything else (RAG, fine-tuning, agents) is wrapping or augmenting these six steps.
The model is a stack of identical layers (often 32 to 96 of them in modern LLMs). Each layer has two sub-layers: self-attention and a feed-forward network. The output of each layer is the input to the next. The final layer projects back to vocabulary size and a softmax produces probabilities.
Quick recall
3 prompts · think before you flip
Prompt 1 of 3
Recite the six steps without looking.
Quiz time
1 question · tap an answer to check it
1. Generation stops when
Finished lesson 2.1?
Mark complete to update your module progress and unlock the streak.