Lesson 11.1: Hallucinations: causes and concrete mitigations | GeekHub Learn

A confidently wrong AI answer can lose a customer or a court case. This lesson is the practical defense.

A friend who never says "I do not know". Charming, dangerous, eventually fired from any serious job.

Hallucinations happen because LLMs sample probable tokens, not retrieve facts. Causes:

Asked about post-training events
Asked about your private data the model never saw
Long context with the answer in the middle (lost)
Vague prompts that leave too much to the model

Mitigations:

RAG with strict refusal rules
Tool use: search, calculators, DB lookups for facts
Citations: require sources
Temperature 0 for factual tasks
Verifier model: a second LLM that fact-checks against sources
Human in the loop for high-stakes use cases

A verifier pattern:

def answer_then_verify(question):
    ans = generate_with_rag(question)
    verdict = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "system", "content": "Reply YES if the answer is fully grounded in the sources, otherwise NO with reason."},
                  {"role": "user", "content": f"Sources:\n{ans['sources']}\n\nAnswer:\n{ans['text']}"}]
    )
    return ans, verdict.choices[0].message.content

Visualize it

A funnel diagram: question -> retrieve -> generate -> verify -> ship. Each step labeled with mitigation it adds.

Try it now

Ask any LLM about a fake event in your private life ("In 2024 I won the Bangalore Marathon. What time did I run?"). Note the hallucination. Add a system rule to refuse if unknown. Retry.

Hands-on lab

Add a "do not invent" rule and a verifier step to your PDF chatbot. Re-run your eval.

Try it now

When is human review the only acceptable mitigation?

Common mistakes

Trusting "temperature 0" as a hallucination fix (it is not)
No "I do not know" path
Skipping citations on factual answers

Debugging tip

If hallucinations spike after a model upgrade, your prompts may be too permissive. Tighten refusal language.

Challenge

Add a "groundedness score" computation that flags low-confidence answers in red in your UI.

Where this shows up

Customer support
Medical assistants (with human review)
Legal research helpers
Financial Q&A

From the field

In 2026 every serious team has a "hallucination dashboard". You will too, eventually.

Recap

Hallucinations are sampling, not bugs. Defend with RAG, refusal, citations, verification, and human review where stakes are high.

Quick recall

3 prompts · think before you flip

Prompt 1 of 3

Why do LLMs hallucinate?

Quiz time

1 question · tap an answer to check it

1. The strongest single defense against hallucination is