The 5-step RAG pipeline
If you can recite the 5 steps from memory, you can design, debug, and explain any RAG system in the world. This lesson hands you that mental model.
A library system: catalog the books, write index cards, store the cards, look up by topic, hand the right cards to the reader. Same five steps in RAG.
The 5 steps:
- Load: pull documents from their source (PDF, web, DB).
- Chunk: split documents into small overlapping pieces.
- Embed: convert each chunk to a vector.
- Store: index the vectors in a vector database.
- Retrieve and Generate: embed the user question, find nearest chunks, stuff them into the prompt, generate.
Steps 1 to 4 happen offline (or at ingest time). Step 5 happens per query.
Offline ingest pipeline:
PDFs -> text -> chunks of ~500 tokens with 50-token overlap -> embeddings -> ChromaDB
Online query pipeline:
user query -> embedding -> top-K chunk lookup -> prompt = [system + top chunks + user query] -> LLM -> answer
Default starting values: chunk_size=500 tokens, overlap=50 tokens, top_K=4 to 6 chunks.
Quick recall
3 prompts · think before you flip
Prompt 1 of 3
List the 5 steps in order.
Quiz time
1 question · tap an answer to check it
1. The first step of a RAG pipeline is
Finished lesson 7.2?
Mark complete to update your module progress and unlock the streak.