Next-token prediction: ChatGPT is really autocomplete
The whole magical thing is a glorified autocomplete. Once that lands, everything from hallucinations to creativity makes sense.
You type "I am going to the gym to..." and your phone suggests "work out", "lift weights", "exercise". That is next-token prediction. ChatGPT does the same, but with 96 layers of context and a vocabulary of 200,000 tokens.
At every step, the model outputs a probability for each token in its vocabulary. The token with the highest probability is the "most likely next token". The system picks one (often not strictly the top, see "temperature") and appends it. Then it does it again. And again. Until a stop token.
The seemingly intelligent paragraph is really 800 sequential autocompletes.
The model emits a vector of "logits" of length vocab_size. A softmax turns it into a probability distribution. Sampling strategies include:
- Greedy (temperature 0): always pick the top token. Deterministic, sometimes repetitive.
- Temperature sampling: divide logits by
Tand sample.T > 1is wilder,T < 1is more conservative. - Top-k: only sample from the top k tokens.
- Top-p (nucleus): only sample from tokens whose cumulative probability adds to
p.
Quick recall
3 prompts · think before you flip
Prompt 1 of 3
What does temperature control?
Quiz time
1 question · tap an answer to check it
1. Hallucinations occur because
Finished lesson 2.4?
Mark complete to update your module progress and unlock the streak.