Tech Stack and Tools

The full, opinionated 2026 stack for this course. Use it as your "what to install and why" reference.

TL;DR

Python 3.11+ | Streamlit | OpenAI / Gemini / Anthropic APIs
ChromaDB or pgvector | tiktoken | python-dotenv
LangChain (light) | Hugging Face (optional) | FastAPI (light intro)
GitHub | VS Code or Cursor | Colab or Replit for fast experiments
Streamlit Cloud | Hugging Face Spaces | Vercel | Railway

Languages and runtimes

Python 3.11+ is the default. Easy install via python.org or pyenv.
Node 20+ if you take the Next.js stretch. Install via nvm.

Environments

Local: VS Code (free) or Cursor (AI-first IDE).
Cloud notebooks: Google Colab (free with limits), Kaggle Notebooks, Replit.
Containers: Docker Desktop or Podman, optional.

Core Python libraries

Library	Purpose	Install
`openai`	OpenAI API client	`pip install openai`
`anthropic`	Claude API client	`pip install anthropic`
`google-genai`	Gemini API client	`pip install google-genai`
`python-dotenv`	load `.env` keys	`pip install python-dotenv`
`tiktoken`	count tokens	`pip install tiktoken`
`streamlit`	UI framework	`pip install streamlit`
`chromadb`	vector DB	`pip install chromadb`
`pypdf`	PDF parsing	`pip install pypdf`
`pymupdf`	better PDF parsing	`pip install pymupdf`
`numpy`	vector math	`pip install numpy`
`tenacity`	retries with backoff	`pip install tenacity`
`pydantic`	data validation	`pip install pydantic`
`fastapi`	backend API (light intro)	`pip install fastapi uvicorn`
`langchain` / `langchain_openai`	optional orchestration	`pip install langchain langchain-openai`
`sentence-transformers`	self-hosted embeddings	`pip install sentence-transformers`
`rank-bm25`	keyword search	`pip install rank-bm25`

LLM providers and free tiers

Free-tier numbers change frequently. Always check the provider dashboard for the current quota; the snapshot below reflects publicly documented limits at time of writing.

Provider	Free tier signal	Card required	When to pick
Google AI Studio (Gemini)	Free tier on Flash and Flash-Lite models, roughly 5 to 15 requests/min with a 250K-token/min ceiling shared across models	No	Default free choice for this course
Groq	~30 requests/min, ~6K tokens/min, ~14.4K requests/day on most hosted models (Llama 3.x, Gemma)	No	Fastest free inference, open weights
Hugging Face Inference Providers	Free credit + serverless inference on many open models	No	Open-model experiments
OpenAI	Trial credit on new accounts	Yes	Tutorials and SDK familiarity
Anthropic Claude	Trial credit on new accounts	Yes	Long-context document analysis
Together AI	$5 minimum top-up to access platform; startup credits available	Yes	Open-source model hosting at scale
Cohere	Generous free tier on multilingual embeddings	Yes	Multilingual RAG

Tip: at the start of every project, set a hard spending limit in any provider dashboard that takes a card. For this course, the Google AI Studio + Groq combo gets most learners to the capstone without spending anything.

Embeddings

Model	Provider	Free path	Strengths
`text-embedding-3-small`	OpenAI	Paid only	Cheap, balanced, the default
`text-embedding-3-large`	OpenAI	Paid only	Higher quality, larger dim
`voyage-3`	Voyage AI	Free tier, generous	Often top of MTEB
`embed-multilingual-v3`	Cohere	Free tier	Multilingual strength
`bge-base-en-v1.5`	BAAI (open)	Free, self-host via `sentence-transformers`	Strong English baseline
`nomic-embed-text-v1.5`	Nomic (open)	Free, self-host	Long-context embeddings
`mxbai-embed-large`	Mixedbread (open)	Free, self-host	Solid quality, small model

Going fully free: install sentence-transformers, load BAAI/bge-base-en-v1.5, and call model.encode(...). Runs on CPU on any laptop. No API key, no cost.

Vector databases

DB	Free path	Best for
ChromaDB	Free, embedded or self-host (Apache 2.0)	Prototypes, this course's PDF chatbot
FAISS	Free, in-process library	Research, fast in-memory similarity
pgvector	Free, Postgres extension; works on Supabase free tier (500 MB)	Teams already on Postgres
Qdrant	Free, self-host; managed cloud has a free tier	Production, hybrid search
Weaviate	Free, self-host; managed cloud has a free tier	Production, native hybrid
Milvus	Free, self-host	Massive scale
Pinecone	Paid (limited free starter exists, often gated)	Easiest managed scaling

Run-it-yourself stack (no API key)

For learners who want to skip API providers entirely:

Ollama (ollama.com) lets you download and run Llama 3.x, Mistral, Gemma 3, Phi, Qwen, and DeepSeek-R1 locally. Works on macOS, Linux, and Windows. 7B to 8B models run comfortably on 16 GB of RAM. Zero per-token cost, zero data leaving your machine, fully offline after the initial download.
LM Studio is a GUI alternative to Ollama, helpful if you prefer a click-through interface to evaluate models.
llama.cpp is the lower-level engine both projects build on; advanced learners use it directly for quantization control.

A typical free-stack project: Ollama (Llama 3.1 8B) plus sentence-transformers for embeddings plus ChromaDB for retrieval plus Streamlit for the UI. Every component is open source, runs on a laptop, and costs nothing per query.

Deployment platforms

Platform	App types	Free tier reality
Streamlit Cloud	Streamlit	Free, sleeps when idle, ~1 GB RAM
Hugging Face Spaces	Streamlit, Gradio, Docker, Static	Free CPU; paid GPU on demand
Vercel	Next.js, Edge Functions	Generous free tier on the Hobby plan
Railway	FastAPI, workers, DBs	$5 free trial credit on signup, then usage-based
Fly.io	Containers, long-running	Free quota on small machines, pay beyond
Render	Web services, workers	Free tier exists but web services sleep aggressively
Cloudflare Workers / Pages	Edge functions, static sites	100K requests/day free

Observability and monitoring

Tool	Purpose
`logging` (stdlib)	Local logs
Logtail / Better Stack	Hosted log aggregation (free tier)
Helicone	LLM-specific observability proxy
LangSmith	LangChain-native tracing
Phoenix (Arize)	Open-source LLM tracing

Eval frameworks

Tool	Strength
Promptfoo	Side-by-side prompt eval
Ragas	RAG-specific metrics
TruLens	Production LLM evals
OpenAI Evals	Open-source eval harness
LangSmith Evals	LangChain-native

Setup commands

Create a project from zero:

mkdir my-ai-app && cd my-ai-app
python -m venv .venv
source .venv/bin/activate          # Mac/Linux
.venv\Scripts\activate            # Windows

pip install streamlit openai python-dotenv tiktoken chromadb pypdf tenacity pydantic

echo ".env" > .gitignore
echo ".venv/" >> .gitignore
echo "chroma_db/" >> .gitignore
echo "data/" >> .gitignore

git init

.env:

OPENAI_API_KEY=sk-...
GEMINI_API_KEY=...
ANTHROPIC_API_KEY=sk-ant-...

Verify:

python -c "from openai import OpenAI; print(OpenAI().models.list().data[0].id)"

Cost-aware defaults

Default model for prototypes: gpt-4o-mini (or gemini-2.5-flash on the free tier)
Default embedding model: text-embedding-3-small (or BAAI/bge-base-en-v1.5 self-hosted)
Default chunk size: 500 tokens with 80 overlap
Default top-K: 5
Default max_tokens for chat: 600 to 800
Default temperature: 0.4 for factual, 0.8 for creative

The fully free path

You can finish this entire course without spending a rupee. Three valid free routes, all interchangeable per module:

Route A: Free hosted APIs (no credit card)

LLM: Google AI Studio (Gemini Flash family). Free tier on ai.google.dev. Recently tightened to about 5 to 15 requests per minute with a shared 250K tokens-per-minute cap, which is fine for learning. No credit card required.
LLM (open weights, fastest): Groq at console.groq.com. Around 30 requests/min, 6K tokens/min, 14K requests/day on hosted Llama and Gemma models. No credit card required.
Embeddings: Cohere's free tier on embed-multilingual-v3 (1,000 requests/min on the trial key), or self-host BAAI/bge-base-en-v1.5 via sentence-transformers.
Vector DB: ChromaDB locally. Zero cost, persistent on disk.
Deploy: Streamlit Cloud (auto-deploys from GitHub) or Hugging Face Spaces.

Route B: 100% local (no internet after install)

LLM: Ollama at ollama.com. Run Llama 3.1 8B, Mistral 7B, Gemma 3, Phi-4, Qwen, or DeepSeek-R1 on your laptop. 16 GB RAM handles 7 to 8B models comfortably.
Embeddings: sentence-transformers with BAAI/bge-base-en-v1.5, runs on CPU.
Vector DB: ChromaDB local mode.
Deploy: localhost during development, then containerize via Docker if needed.

Route C: Hybrid (free hosted LLM + local everything else)

LLM: Groq for speed, Gemini for long context, Hugging Face Inference Providers for variety. All free.
Embeddings: Self-host bge-base-en-v1.5.
Vector DB: ChromaDB local, or pgvector on Supabase's free tier (500 MB) if you want it cloud-hosted.
Deploy: Streamlit Cloud or Vercel free tier.

The PDF chatbot capstone (Module 9) runs end-to-end on any of these three routes. Free-tier rate limits change frequently, so always confirm the live numbers on the provider dashboard before relying on them in production.

What we deliberately avoid in this course

Heavy ML training (no PyTorch deep dives at beginner level)
Custom transformer implementations
Manual fine-tuning runs (better in the intermediate course)
AWS / GCP / Azure deep provisioning (overkill for beginners)
Kubernetes (not for first apps)

Optional power-ups

Cursor and Windsurf: AI-pair-programming IDEs with free tiers.
Continue.dev: open-source coding assistant that connects to any model (Ollama, OpenAI, Groq).
Aider: free CLI pair programmer that works with Ollama, Gemini, Groq, OpenAI.
Supabase: Postgres + auth + storage + pgvector on a free tier (500 MB DB, 1 GB storage).
n8n (self-hosted): free workflow automation for AI agents.