The retrieve-and-answer flow with citations
This is the lesson where the chatbot actually answers. The trick is forcing citations so users trust the output.
A research assistant who not only answers your question but tells you which book and page they got it from.
Flow per query:
- Embed the user question.
- Query Chroma for top-K (e.g., 5) chunks.
- Build a prompt with system rules + retrieved chunks + the user question.
- Tell the model to cite (filename + page) for each claim.
- Stream the answer.
SYSTEM = """You are a PDF chatbot. Use ONLY the provided sources to answer.
If the answer is not in the sources, say "I could not find that in the documents."
After each fact, cite the source like [filename p.X].
"""
def retrieve(question, k=5):
q_emb = embed(question)
res = col.query(query_embeddings=[q_emb], n_results=k)
docs = res["documents"][0]
metas = res["metadatas"][0]
return list(zip(docs, metas))
def answer(question):
pairs = retrieve(question, k=5)
context = "\n\n".join(
f"[Source: {m['source']} p.{m['page']}]\n{d}" for d, m in pairs
)
messages = [
{"role": "system", "content": SYSTEM},
{"role": "user", "content": f"Sources:\n{context}\n\nQuestion: {question}"},
]
stream = openai_client.chat.completions.create(
model="gpt-4o-mini", messages=messages, stream=True
)
text = ""
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
text += delta
yield delta
return text
Quick recall
3 prompts · think before you flip
Prompt 1 of 3
Why include sources in the prompt with explicit tags?
Quiz time
1 question · tap an answer to check it
1. To reduce hallucinations in RAG, the most reliable move is to
Finished lesson 9.5?
Mark complete to update your module progress and unlock the streak.
Loading