Pretraining vs fine-tuning vs RLHF, in plain English
Every model you will use went through three life stages. Knowing them tells you what it is good at and where it will fail.
Stage 1: read every book in the library. Stage 2: take a specialized course. Stage 3: get coached by mentors on how to behave. That is pretraining, fine-tuning, RLHF.
- Pretraining: the model is fed trillions of tokens of text and learns to predict the next token. This gives it language fluency, world knowledge, and general capabilities. Costs millions of dollars.
- Fine-tuning (Supervised): the model is shown thousands of pairs of (instruction, ideal response). This teaches it to follow instructions in the desired format.
- RLHF (Reinforcement Learning from Human Feedback): humans rank multiple model responses. A reward model learns the rankings. The model is updated to produce higher-ranked answers. This teaches helpfulness, harmlessness, and honesty.
Modern frontier models also use RLAIF (AI feedback in place of humans) and constitutional AI to scale this stage.
You will rarely pretrain a model yourself (cost is prohibitive). You may fine-tune one if you have 100 to 10,000 high-quality examples. You almost never run RLHF yourself for chat. Most production teams stop at fine-tuning, often using LoRA (low-rank adapters) to keep costs low.
Quick recall
3 prompts · think before you flip
Prompt 1 of 3
Define pretraining, fine-tuning, and RLHF.
Quiz time
1 question · tap an answer to check it
1. To make the model answer in a specific JSON format every time, your first move should be
Finished lesson 2.5?
Mark complete to update your module progress and unlock the streak.