GeekHub Learn
Module

The 5-step RAG pipeline

If you can recite the 5 steps from memory, you can design, debug, and explain any RAG system in the world. This lesson hands you that mental model.

A library system: catalog the books, write index cards, store the cards, look up by topic, hand the right cards to the reader. Same five steps in RAG.

The 5 steps:

  1. Load: pull documents from their source (PDF, web, DB).
  2. Chunk: split documents into small overlapping pieces.
  3. Embed: convert each chunk to a vector.
  4. Store: index the vectors in a vector database.
  5. Retrieve and Generate: embed the user question, find nearest chunks, stuff them into the prompt, generate.

Steps 1 to 4 happen offline (or at ingest time). Step 5 happens per query.

Offline ingest pipeline:

PDFs -> text -> chunks of ~500 tokens with 50-token overlap -> embeddings -> ChromaDB

Online query pipeline:

user query -> embedding -> top-K chunk lookup -> prompt = [system + top chunks + user query] -> LLM -> answer

Default starting values: chunk_size=500 tokens, overlap=50 tokens, top_K=4 to 6 chunks.

Visualize it

A two-row pipeline diagram: top row "Ingest (offline)" with 4 boxes; bottom row "Query (online)" with 4 boxes. Arrows show data flow.

Try it now

Take any document. Manually chunk it into 300-word pieces. For a question, by hand pick the 2 chunks you would want the model to see. Notice you just did RAG without code.

Hands-on lab

On paper, sketch the 5-step pipeline. Label each step with what data flows in and out. Take a photo.

Try it now

Why is chunk overlap useful?

Common mistakes

  • Skipping chunking and embedding whole documents (kills retrieval quality)
  • Chunking too small (loses context) or too large (over-retrieves)
  • Forgetting to filter out duplicate or near-identical chunks

Debugging tip

When RAG quality is poor, instrument retrieval first: log which chunks were retrieved for failing questions. Almost always the issue is there.

Challenge

For a 100-page PDF, propose a chunking strategy with rationale: chunk size, overlap, metadata fields, expected retrieval count.

Where this shows up

  • All "chat with your docs" features
  • Internal company assistants
  • Code search bots

From the field

A senior RAG engineer's job is mostly chunk strategy, metadata design, and retrieval evaluation. The LLM is the least-tunable part.

Recap

Load, chunk, embed, store, retrieve and generate. Memorize this and you can talk RAG with anyone.


Quick recall

3 prompts · think before you flip

Prompt 1 of 3

List the 5 steps in order.

Quiz time

1 question · tap an answer to check it

  1. 1. The first step of a RAG pipeline is

Finished lesson 7.2?

Mark complete to update your module progress and unlock the streak.

Loading