GeekHub Learn
Module
Lesson 9.11 of 8 in this module2 min read Module 9: Building a PDF Chatbot (RAG Project)

Project setup and architecture review

Five minutes of planning saves five hours of refactoring. We sketch first, code second.

Before building a kitchen, you draw it. Before building a RAG app, you sketch the pipeline.

The architecture (drawn in Module 7.5):

PDFs -> text -> chunks -> embeddings -> ChromaDB
                                         |
user question -> embed -> top-K -> prompt with chunks -> LLM -> answer + citations

Decisions to lock in upfront:

  • Embedding model: text-embedding-3-small (cheap, capable)
  • LLM: gpt-4o-mini (cheap, capable)
  • Vector DB: ChromaDB (local, persistent)
  • UI: Streamlit
  • Citations: include source filename and page number

Project structure:

pdf-chatbot/
  app.py            # Streamlit UI
  rag.py            # ingest + retrieve helpers
  prompts.py        # system prompt + answer template
  data/             # uploaded PDFs (gitignored)
  chroma_db/        # vector store (gitignored)
  requirements.txt
  .env              # API keys (gitignored)
  .gitignore
  README.md

Visualize it

A file-tree diagram next to a runtime architecture diagram. Side by side. Wiring becomes obvious.

Try it now

Create the folder structure. Initialize git init, write .gitignore, push an empty repo. 10 minutes.

Hands-on lab

Set up the project skeleton above. Create empty stub functions in rag.py. Commit.

Try it now

Why is splitting app.py from rag.py worth it even for a small project?

Common mistakes

  • One giant app.py with everything mixed (impossible to test)
  • Skipping .gitignore (you will leak chroma_db or .env)
  • No README.md (cannot show recruiters)

Debugging tip

If you cannot draw and explain your structure in 60 seconds, restructure now. It is faster than later.

Challenge

Write the README's "Architecture" section in 200 words with a Markdown diagram.

Where this shows up

  • All production AI apps
  • Demo projects for jobs
  • Open-source RAG starters

From the field

A clean file structure on a GitHub repo is the first signal recruiters scan. Most beginners ship a mess. You will not.

Recap

Sketch, scaffold, gitignore, README. Then code.


Quick recall

3 prompts · think before you flip

Prompt 1 of 3

Why split UI from RAG logic?

Quiz time

1 question · tap an answer to check it

  1. 1. The vector DB folder should be

Finished lesson 9.1?

Mark complete to update your module progress and unlock the streak.

Loading