Lesson 9.1: Project setup and architecture review | GeekHub Learn

Five minutes of planning saves five hours of refactoring. We sketch first, code second.

Before building a kitchen, you draw it. Before building a RAG app, you sketch the pipeline.

The architecture (drawn in Module 7.5):

PDFs -> text -> chunks -> embeddings -> ChromaDB
                                         |
user question -> embed -> top-K -> prompt with chunks -> LLM -> answer + citations

Decisions to lock in upfront:

Embedding model: text-embedding-3-small (cheap, capable)
LLM: gpt-4o-mini (cheap, capable)
Vector DB: ChromaDB (local, persistent)
UI: Streamlit
Citations: include source filename and page number

Project structure:

pdf-chatbot/
  app.py            # Streamlit UI
  rag.py            # ingest + retrieve helpers
  prompts.py        # system prompt + answer template
  data/             # uploaded PDFs (gitignored)
  chroma_db/        # vector store (gitignored)
  requirements.txt
  .env              # API keys (gitignored)
  .gitignore
  README.md

Visualize it

A file-tree diagram next to a runtime architecture diagram. Side by side. Wiring becomes obvious.

Try it now

Create the folder structure. Initialize git init, write .gitignore, push an empty repo. 10 minutes.

Hands-on lab

Set up the project skeleton above. Create empty stub functions in rag.py. Commit.

Try it now

Why is splitting app.py from rag.py worth it even for a small project?

Common mistakes

One giant app.py with everything mixed (impossible to test)
Skipping .gitignore (you will leak chroma_db or .env)
No README.md (cannot show recruiters)

Debugging tip

If you cannot draw and explain your structure in 60 seconds, restructure now. It is faster than later.

Challenge

Write the README's "Architecture" section in 200 words with a Markdown diagram.

Where this shows up

All production AI apps
Demo projects for jobs
Open-source RAG starters

From the field

A clean file structure on a GitHub repo is the first signal recruiters scan. Most beginners ship a mess. You will not.

Recap

Sketch, scaffold, gitignore, README. Then code.

Quick recall

3 prompts · think before you flip

Prompt 1 of 3

Why split UI from RAG logic?

Quiz time

1 question · tap an answer to check it

1. The vector DB folder should be