Lesson 7.5: Sketching a production RAG architecture | GeekHub Learn

A whiteboard sketch is what gets your RAG idea funded. This lesson hands you the canonical 2026 architecture.

The architecture is the blueprint. Even if you do not pour the foundation yourself, you must be able to draw the house.

A canonical 2026 production RAG architecture has:

Ingestion service: pulls and parses sources (PDFs, web scrapes, DB exports).
Chunker: splits documents with metadata.
Embedder: calls an embedding API.
Vector DB: stores vectors + metadata (Pinecone, Weaviate, Chroma, Qdrant, pgvector).
Hybrid search: combines vector similarity with keyword/BM25 search.
Reranker: re-orders top results with a small cross-encoder model.
LLM generator: produces the final answer.
Eval and feedback loop: logs queries, retrievals, answers, user thumbs.

Skip steps 5 and 6 for v1. Add them when retrieval quality matters.

A minimal v1 you can ship this month: Loader -> Chunker -> OpenAI embeddings -> ChromaDB -> top-K cosine -> GPT-4o-mini.

A v2 production system: add hybrid search, reranking (Cohere or Voyage reranker), evaluation harness, and a cache.

Visualize it

A boxes-and-arrows architecture diagram with 8 boxes for v2 and a "v1" subset highlighted.

Try it now

Pick a real problem (e.g., chat with the GeekHub docs). Sketch v1 and v2. Mark which components you would ship first.

Hands-on lab

Draw the architecture on paper. Photograph it. Save it for Module 9 where you build the real thing.

Try it now

Why is a reranker useful even when top-K retrieval is "good enough" already?

Common mistakes

Designing v2 before v1 is shipped
Skipping the eval and feedback loop (no way to improve)
Picking a vector DB before you know your data shape

Debugging tip

If you cannot draw the architecture in 5 minutes, you do not understand the system well enough to build it.

Challenge

Pick a real product idea. Draw v1 and v2 architectures. Estimate cost per query for each.

Where this shows up

Internal Q&A assistants
Public-facing docs bots
Codebase-aware coding assistants

From the field

The single most underrated skill is being able to sketch a clear architecture in a meeting. It is half of the senior engineer interview at AI companies.

Recap

8 components, v1 to v2 progression, sketchable in 5 minutes. You are now RAG-architecture literate.

Quick recall

3 prompts · think before you flip

Prompt 1 of 3

Name 8 components of a production RAG system.

Quiz time

1 question · tap an answer to check it

1. A reranker is typically