Why LLMs Hallucinate (And Why We Shouldn't Be Surprised)

Understanding the Fundamental Nature of Large Language Models in Veterinary AI Applications

Jul 24, 2025

As large language models (LLMs) like ChatGPT, Claude, and Bard become increasingly integrated into veterinary workflows—from client communication to clinical decision support—one concern dominates conversations: hallucinations. These AI systems sometimes generate confident-sounding but factually incorrect information, and the veterinary community is rightfully cautious.

But here's what might surprise you: LLM hallucinations aren't a bug to be fixed—they're an inevitable feature of how these systems fundamentally work. Understanding why can help us use these tools more effectively and set appropriate expectations for their role in veterinary practice.

The Closed-Book Exam Analogy

Imagine asking a veterinary student to take a comprehensive exam under these conditions:

No textbooks, notes, or reference materials allowed
No internet access
No ability to look up drug dosages, normal lab values, or disease prevalence data
Must answer every question with confidence, even on topics they've never studied

When that student occasionally gets facts wrong or fills in gaps with plausible-sounding but incorrect information, would you be surprised? Of course not. Yet this is essentially how most LLMs operate.

LLMs are trained on massive datasets and then deployed as "closed-book" systems. They can't:

Access real-time information
Look up current drug databases
Verify facts against authoritative sources
Check their work against reference materials
Update their knowledge after training

They're working entirely from "memory"—patterns learned during training—without the ability to fact-check or research their responses.

The Human Citation Challenge

Consider this question: "What is the capital of France?"

You probably answered "Paris" instantly. But can you cite your source? Can you remember exactly where and when you learned this fact? Likely not—it's become integrated knowledge.

This is how humans typically store and retrieve basic information. We rarely walk around with mental footnotes for fundamental facts. We've synthesized information from multiple sources over time into confident knowledge, even though we can't always trace the provenance.

LLMs exhibit a similar pattern. They've "learned" that certain relationships exist (breed predispositions, drug interactions, anatomical facts) from training data, but they can't point back to specific sources any more than you can cite where you first learned that dogs have four legs.

The Difference: Humans Can Look Things Up

The crucial difference is that when accuracy matters, humans can:

Consult reference materials
Verify facts against authoritative sources
Cross-check information
Acknowledge uncertainty ("Let me look that up")

Most deployed LLMs can't do this—they're operating from trained patterns without external verification mechanisms.

The Mathematical Reality of Language Generation

Here's the fundamental technical reason why hallucinations are inevitable:

LLMs work by predicting the most statistically likely next word (or token) based on the preceding context. At each step, they're making probability calculations:

Given the words "The normal heart rate for a healthy adult dog is approximately..." what word should come next?

The model might assign probabilities like:

"60" (15% probability)
"70" (25% probability)
"80" (30% probability)
"90" (20% probability)
"100" (10% probability)

The system then samples from this probability distribution. Sometimes it will choose the most likely option, sometimes a less likely one. This sampling process introduces inherent randomness.

Why This Matters for Veterinary Applications

Even if the LLM has learned correct patterns from training data, the probabilistic nature of generation means:

Correct information can be slightly corrupted during generation
Multiple plausible options might exist, and the model might choose incorrectly
Novel combinations of familiar concepts might create plausible-sounding but wrong information

For example, if an LLM has learned:

"Acepromazine is used for sedation"
"Typical doses are weight-based"
"Canine sedation protocols vary by procedure"

It might generate: "Acepromazine is typically dosed at 0.1-0.3 mg/kg for routine procedures"—which sounds authoritative but might not reflect current best practices or might conflate different protocols.

The Veterinary-Specific Risks

This inherent uncertainty creates particular challenges in veterinary medicine:

Clinical Decision-Making: Wrong drug dosages, contraindications, or diagnostic interpretations can directly harm patients.

Client Communication: Confident-sounding but incorrect information about prognosis, treatment options, or costs can damage trust and lead to poor decisions.

Regulatory Compliance: Incorrect information about drug withdrawal times, prescription requirements, or documentation standards creates legal risks.

Species-Specific Variations: LLMs might conflate information between species ("This works in dogs, so it probably works in cats") in ways that veterinarians would never do.

Implications for Veterinary AI Applications

Understanding why hallucinations occur helps us develop better strategies for using LLMs in practice:

What LLMs Do Well

Pattern recognition in complex data
Synthesis of information from multiple sources
Communication and explanation of concepts
Workflow automation for routine tasks

What Requires Extreme Caution

Specific medical recommendations without verification
Drug dosages and administration protocols
Diagnostic interpretations requiring current knowledge
Species-specific treatment advice

The Verification Imperative

Because hallucinations are inevitable, any LLM-generated medical information must be verified against authoritative sources. This isn't a limitation to overcome—it's a fundamental requirement for safe use.

Think of LLMs as extremely knowledgeable but occasionally unreliable research assistants. They can:

Help you brainstorm differential diagnoses
Draft client communications for you to review and edit
Suggest areas to investigate further
Explain complex concepts in accessible language

But they cannot:

Replace your clinical judgment
Provide definitive medical recommendations
Be trusted for critical dosing or safety information
Substitute for current reference materials

Current Solutions: Hybrid Approaches in Practice

The solution isn't to avoid LLMs—it's to integrate them thoughtfully with verification systems. These approaches are being deployed today:

Retrieval-Augmented Generation (RAG): LLMs connected to current databases that can look up facts rather than relying solely on training memory. In my experience implementing these systems, they significantly reduce hallucinations while maintaining the conversational capabilities that make LLMs valuable.

Multi-Step Verification: Systems that check LLM outputs against authoritative databases before presenting information. In my experience building these verification pipelines, they're essential for any application where accuracy is critical.

Confidence Scoring: Models that indicate their uncertainty level about specific statements, allowing users to understand when additional verification is most critical.

Citation Integration: Systems that can point to specific sources for factual claims are already being implemented in enterprise applications.

These approaches acknowledge that hallucinations are inherent to the technology while providing practical solutions that organizations are using today to mitigate risks.

Proven Veterinary Implementation

The theoretical solutions I've described aren't just academic concepts—they're already working in veterinary practice. During my involvement in developing LifeLearn's Sofie AI, we implemented exactly this type of advanced RAG system, grounding LLM responses in tens of thousands of pages of licensed veterinary medical content. The result demonstrates how proper implementation can dramatically reduce hallucinations while maintaining the conversational capabilities that make LLMs valuable for clinical decision support.

This real-world example shows that the hybrid approaches needed for safe veterinary AI aren't just possible—they're available today when built with appropriate veterinary expertise and content infrastructure.

Key Insights for Veterinary Practice

🧠 Understand the fundamental limitation: LLMs are "closed-book" systems working from trained patterns, not live databases. Occasional errors are mathematically inevitable.

✅ Always verify medical information: Any drug dosages, treatment protocols, or diagnostic recommendations from LLMs must be checked against current veterinary references.

🎯 Use LLMs for their strengths: Pattern recognition, communication drafting, concept explanation, and workflow automation—not for definitive medical advice.

📚 Maintain your reference standards: LLMs supplement but never replace current veterinary literature, drug formularies, and clinical guidelines.

🚨 Recognize high-risk scenarios: Be especially cautious with species-specific information, new drugs or procedures, and any recommendations that seem "surprising" or novel.

👥 Educate your team: Ensure all staff understand that LLM outputs require verification, especially for any client-facing communications about medical topics.

🔄 Treat LLMs as research assistants: Valuable for generating ideas and drafts, but everything needs professional review before implementation.

📊 Stay updated on AI developments: As retrieval-augmented and citation-capable systems emerge, the landscape will evolve—but verification will always be necessary.

Conclusion

LLM hallucinations aren't a temporary glitch to be solved—they're an intrinsic characteristic of how these systems work. Understanding this helps us use them appropriately: as powerful tools for pattern recognition, communication assistance, and workflow support, but never as authoritative sources for medical information.

The goal isn't to eliminate hallucinations (which is mathematically impossible) but to build workflows that harness LLM capabilities while maintaining the rigorous verification standards that veterinary medicine demands.

Just as we've learned to use diagnostic tests by understanding their sensitivity, specificity, and limitations, we can effectively integrate LLMs by understanding their probabilistic nature and inherent uncertainty. The key is treating them as sophisticated assistants rather than infallible oracles.

In veterinary practice, where patient safety and client trust are paramount, this understanding isn't just academic—it's essential for responsible AI adoption.

Still experimenting with this AI generated podcast summary of this article. Try it out:

1×

0:00

-6:36

What's your experience with AI hallucinations in practice? Have you encountered confident-sounding but incorrect information from AI tools? I'd love to hear about both the challenges you've faced and any solutions you've found effective.

Reply to this post or reach out directly - your real-world experiences help shape the practical insights that make these analyses valuable for the veterinary community.

Eric Fish, DVM

Jul 25

Great piece! I have noticed anecdotally that Perplexity.ai hallucinates *much* less (if at all) than the latest models from Claude or ChatGPT (the one with internet search access). It has become my go-to replacement for Google searches that long ago became low yield. Do you know if Perplexity uses RAG or another approach to tune down the problem of fake citations and made-up results?

Expand full comment

4 replies by Dave Kincaid and others

4 more comments...

Prior Knowledge and Practice

Discussion about this post