Module 10: Deploying AI Apps

Module Goal

Get any AI app you build live on the internet, securely and cheaply. Cover the four free and cheap platforms beginners actually need.

Estimated Duration

3 to 4 hours.

Skills Learned

Deploying Streamlit, Gradio, FastAPI, Next.js AI apps
Managing API keys and secrets in production
Setting up monitoring and basic logs
Cost guards and rate limiting
Choosing a platform for the right reasons

Real-world Importance

A non-deployed project is invisible to recruiters, users, and your future self. Owning deployment is the difference between "I learned AI" and "I shipped AI".

Lessons in this module

The 4 platforms beginners should know
Streamlit Cloud and Hugging Face Spaces in depth
Vercel for Next.js AI apps
Railway for FastAPI backends
Secrets, logging, monitoring, cost guards

Lesson 10.1: The 4 platforms beginners should know

Hook / Why This Matters

Pick once, ship five projects. This lesson is the platform decision tree.

Beginner Analogy

Different airports for different routes. You do not pick the same one for a domestic flight and a cargo shipment.

Concept Explanation

The 4-platform map:

App type	Best free platform	Why
Streamlit demo	Streamlit Cloud	one-click
Gradio demo	Hugging Face Spaces	native
Next.js (frontend + serverless)	Vercel	best DX
FastAPI backend	Railway or Fly.io	persistent server, env vars, simple

Technical Breakdown

Beyond free tiers, when usage grows:

Streamlit Cloud paid (more memory, custom domain)
Hugging Face Spaces hardware tiers (GPU)
Vercel Pro (more functions, edge)
Railway Hobby (longer-running services)

Visual Learning Suggestion

A decision tree: "What kind of app?" -> Streamlit/Gradio/Next.js/FastAPI -> recommended platform.

Interactive Element

Hands-on Lab

Make a "hello" deploy on each platform. 30 minutes total. You will reuse these skills forever.

Mini Exercise

When would you choose Hugging Face Spaces over Streamlit Cloud?

Common Mistakes

Trying to deploy a Next.js app on Streamlit Cloud
Putting a heavy backend on Vercel (use Railway/Fly)
Skipping the CLI install (faster iteration once you have it)

Debugging Tips

If you cannot decide, default to Streamlit Cloud for prototypes and Vercel + Railway for "real" apps.

Knowledge Check Questions

Which platform for a FastAPI backend?
Which for a Gradio demo?
Which for Next.js?

Quiz Questions

A Streamlit chatbot first deploy should go to: a) Streamlit Cloud b) AWS EC2 c) Vercel d) GCP VM Answer: a

Challenge Task

Make a "hello world" on all four platforms. Bookmark each dashboard.

Real-world Use Cases

Portfolio demos
Internal team tools
Public AI utilities

Industry Insight

Junior AI engineers should master deploying 4 stacks (Streamlit, Gradio, Next.js, FastAPI) on these 4 platforms. That single matrix is a moat.

Interview Questions

Where would you deploy a quick chatbot demo? Why?
What is the tradeoff between Streamlit Cloud and HF Spaces?
When would you graduate off these and onto AWS?

Summary

Four platforms. Four app types. Memorize the matrix.

Lesson 10.2: Streamlit Cloud and Hugging Face Spaces in depth

Hook / Why This Matters

The two free platforms beginners use most. Master the secrets, the dependency files, the constraints.

Beginner Analogy

These are the bicycles of AI deployment. Free, fast, slightly limited, perfect to learn on.

Concept Explanation

Streamlit Cloud:

Connect GitHub
Pick repo, branch, file
Add secrets (OPENAI_API_KEY etc.) in dashboard
Auto-deploys on every push
Free tier: 1 GB RAM, 1 GB storage, sleeps after inactivity

Hugging Face Spaces:

Connect or create a Space
Pick SDK: Streamlit, Gradio, Static, Docker
Add secrets in the Settings tab
Free tier: CPU 2 vCPU 16 GB; sleeps; GPU available at $0.60+/hr

Technical Breakdown

For both, your repo needs:

requirements.txt (or pyproject.toml for Streamlit)
An entry file (app.py or streamlit_app.py)
A .gitignore excluding .env, chroma_db/, data/

For Spaces, also a README.md with frontmatter:

---
title: My PDF Chatbot
sdk: streamlit
sdk_version: 1.40.0
app_file: app.py
---

Visual Learning Suggestion

Side-by-side screenshots of both deployment dashboards with annotated secret-management areas.

Interactive Element

Deploy your Module 6 chatbot to Streamlit Cloud. Deploy your Module 9 PDF chatbot to Hugging Face Spaces.

Hands-on Lab

Both deploys, both URLs in your README.

Mini Exercise

When is Hugging Face Spaces preferable for a heavy model?

Common Mistakes

Hardcoding secrets (free tier scanners catch them quickly)
Missing requirements.txt
Forgetting that ChromaDB persistence is ephemeral on free tiers (use external storage or accept it)

Debugging Tips

If your free-tier deploy keeps "sleeping" mid-demo, switch to a different platform or pay for an always-on tier when demoing to recruiters.

Knowledge Check Questions

Where do you set secrets on Streamlit Cloud?
Why use HF Spaces over Streamlit for some models?
What is the role of the README frontmatter on Spaces?

Quiz Questions

To deploy a Gradio app for free with GPU access on demand, choose: a) Streamlit Cloud b) Hugging Face Spaces c) Vercel d) Railway Answer: b

Challenge Task

Add an "Embeddings on device" mode using sentence-transformers and deploy it to HF Spaces.

Real-world Use Cases

AI demos
Open-source community tools
Hobby projects

Industry Insight

Many open-source AI projects in 2026 maintain both a Streamlit Cloud and an HF Spaces deploy for redundancy and reach.

Interview Questions

Walk me through deploying a Streamlit app.
How do you handle secrets on these platforms?
What are their limitations?

Summary

Both are free, GitHub-integrated, beginner-friendly. Master both.

Lesson 10.3: Vercel for Next.js AI apps

Hook / Why This Matters

When you graduate from Streamlit to a real frontend, Vercel is where Next.js apps go.

Beginner Analogy

If Streamlit is a bicycle, Next.js + Vercel is a car. More setup, far more power, the bar for production work.

Concept Explanation

Vercel hosts Next.js apps as serverless functions + static assets, deploys on git push, manages secrets, and offers analytics.

For AI calls, you can use:

Next.js Route Handlers / API routes for backend calls
Vercel AI SDK for streaming chat UIs
Edge runtime for low-latency streaming

Technical Breakdown

A minimal Next.js chat endpoint:

// app/api/chat/route.ts
import { openai } from "@ai-sdk/openai";
import { streamText } from "ai";

export async function POST(req: Request) {
  const { messages } = await req.json();
  const result = streamText({ model: openai("gpt-4o-mini"), messages });
  return result.toDataStreamResponse();
}

Deploy with vercel deploy. Set OPENAI_API_KEY in the Vercel dashboard.

Visual Learning Suggestion

A typical Next.js + Vercel deploy flow diagram: git push -> Vercel build -> serverless deploy -> public URL.

Interactive Element

Clone a Vercel AI Chatbot starter (vercel.com/templates). Deploy. Customize the system prompt.

Hands-on Lab

Deploy a Next.js AI chat app. Add a system prompt. Push to GitHub. Share the URL.

Mini Exercise

When does serverless become a poor fit for your AI app?

Common Mistakes

Heavy long-running jobs in serverless (timeout issues)
Forgetting edge runtime for streaming (Node default works too)
Storing large vector DBs in /tmp (use external services)

Debugging Tips

If functions time out, move long jobs to Railway or a queue. Vercel functions are designed for short tasks.

Knowledge Check Questions

What is the Vercel AI SDK?
Why use edge runtime for chat?
Why move heavy jobs off Vercel?

Quiz Questions

The most natural deploy target for a Next.js AI chatbot is: a) Streamlit Cloud b) Hugging Face Spaces c) Vercel d) Railway Answer: c

Challenge Task

Add a "models" dropdown and route to different providers via a single API route.

Real-world Use Cases

Production-facing AI features
Public marketing demos
SaaS apps

Industry Insight

Vercel + Next.js is now the de facto stack for shipped, polished AI products built by small teams.

Interview Questions

Why Next.js for AI front-ends?
What is the Vercel AI SDK?
How do you handle long-running AI jobs in Vercel?

Summary

Vercel + Next.js is the bar for production. Learn it once and your AI portfolio looks senior.

Lesson 10.4: Railway for FastAPI backends

Hook / Why This Matters

When your AI app needs a long-running backend (RAG ingest, custom embedding pipelines), Railway is the cheapest way to host it.

Beginner Analogy

Vercel is a courier for parcels. Railway is a warehouse where the worker actually lives.

Concept Explanation

Railway runs containerized services with a generous free trial and pay-as-you-grow pricing. Connect a GitHub repo, set env vars, deploy. It runs persistently (no cold starts unless you want them).

Use it for:

FastAPI backends
Persistent vector DB hosts
Background workers (Celery, RQ)
Scheduled ingest jobs

Technical Breakdown

A minimal FastAPI:

# main.py
from fastapi import FastAPI
from pydantic import BaseModel
from openai import OpenAI
import os

app = FastAPI()
client = OpenAI()

class Q(BaseModel):
    question: str

@app.post("/chat")
def chat(q: Q):
    r = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": q.question}],
    )
    return {"answer": r.choices[0].message.content}

requirements.txt:

fastapi
uvicorn
openai

Procfile (Railway):

web: uvicorn main:app --host 0.0.0.0 --port $PORT

Deploy: connect GitHub repo to Railway, set OPENAI_API_KEY, click deploy.

Visual Learning Suggestion

Architecture: Next.js on Vercel to FastAPI on Railway to OpenAI/Anthropic/Gemini.

Interactive Element

Deploy the FastAPI above. Hit /chat from your terminal with curl.

Hands-on Lab

Build a public /summarize endpoint that takes text and returns a 3-sentence summary. Deploy. Test.

Mini Exercise

When does FastAPI on Railway beat Next.js API routes on Vercel?

Common Mistakes

Forgetting the $PORT env var
Not pinning Python and library versions
Leaving spend limits open

Debugging Tips

If deploys fail, check the build log for missing system libs. Sometimes you need apt packages declared in a nixpacks.toml.

Knowledge Check Questions

When use Railway over Vercel?
What is uvicorn?
Why pin versions?

Quiz Questions

A long-running Python job is best deployed to: a) Vercel b) Streamlit Cloud c) Railway or Fly.io d) Hugging Face Spaces free CPU Answer: c

Challenge Task

Add a background ingest endpoint (/ingest) that downloads a PDF URL, chunks it, embeds, and stores in a free Postgres + pgvector hosted on Railway.

Real-world Use Cases

Production FastAPI backends
Scheduled crawlers
Worker services for AI pipelines

Industry Insight

In 2026, the dominant indie stack is Next.js on Vercel + FastAPI on Railway + Supabase for DB + ChromaDB or pgvector for retrieval. Master this and you can ship anything.

Interview Questions

Compare Vercel and Railway.
How do you deploy FastAPI?
How do you split a Next.js frontend from a Python backend?

Summary

Railway is your "real backend" host. Cheap, persistent, container-friendly. Pair it with Vercel for full-stack AI apps.

Lesson 10.5: Secrets, logging, monitoring, cost guards

Hook / Why This Matters

Production AI without observability is gambling. This lesson is the safety net.

Beginner Analogy

A pilot who never reads the instruments will eventually crash. Logs and dashboards are the instruments.

Concept Explanation

Five non-negotiable production habits:

Secrets in env vars or secret manager. Never in code.
Structured logs for every LLM call: timestamp, user_id, prompt_hash, model, tokens, latency, status.
Cost dashboard: provider dashboard + your own.
Hard spending caps in provider dashboards.
Rate limit per user/IP to prevent abuse.

For Python, log with logging and ship to Logtail, Datadog, or Grafana Cloud (free tiers exist).

Technical Breakdown

Wrap LLM calls:

import logging, time, hashlib, json
log = logging.getLogger("ai")

def call_llm(messages, user_id):
    started = time.time()
    h = hashlib.sha256(json.dumps(messages, sort_keys=True).encode()).hexdigest()[:10]
    try:
        r = client.chat.completions.create(model="gpt-4o-mini", messages=messages)
        log.info({
            "user_id": user_id, "prompt_hash": h, "model": "gpt-4o-mini",
            "tokens": r.usage.total_tokens, "latency_ms": int((time.time()-started)*1000),
            "status": "ok",
        })
        return r
    except Exception as e:
        log.error({"user_id": user_id, "prompt_hash": h, "status": "error", "err": str(e)})
        raise

Cost guard via simple per-user counter in Redis, capping daily token spend.

Visual Learning Suggestion

A "production layer cake" diagram: app -> logger -> dashboard -> alerting. Each layer labeled.

Interactive Element

Set a $5 monthly cap on OpenAI right now. Take a screenshot for accountability.

Hands-on Lab

Add structured logging to your PDF chatbot. Print logs locally as JSON.

Mini Exercise

Why hash the prompt instead of logging the raw prompt?

Common Mistakes

Logging full prompts (privacy and storage cost)
No spend caps (free trial gets drained in hours)
No per-user rate limits (one user can rack up your bill)

Debugging Tips

If a feature seems "stuck", check logs for repeated errors. Add a retry around the offending line.

Knowledge Check Questions

Why hash prompts rather than store them?
Why cap spend per user?
What tool would you use for log aggregation on a free tier?

Quiz Questions

The most overlooked production safety control is: a) Hard spending caps b) Bigger model c) More retries d) JSON mode Answer: a

Challenge Task

Build a tiny "/health" endpoint that reports last hour's: requests, error count, average latency, total tokens.

Real-world Use Cases

Production AI apps
SaaS demos at scale
Internal AI tools

Industry Insight

The single biggest "junior to mid-level" promotion driver in 2026 AI roles is "you set up our cost dashboard and prevented the next outage". That is observability.

Interview Questions

What logs do you ship for LLM calls?
How do you prevent runaway cost?
How do you debug an LLM app in production?

Summary

Secrets, logs, dashboards, caps, rate limits. Five habits. Always.

Module 10 Recap

You can deploy any of the four common AI app types to the right platform, with secrets, logs, and cost guards in place. You ship production-ready code now.

Next Module

Module 11: AI Safety, Hallucinations, and Responsible AI