MedHalu: Hallucinations in Responses to Healthcare Queries by Large Language Models

Jun 1, 2026·
Vibhor Agarwal
Yiqiao Jin
Yiqiao Jin
,
Mohit Chandra
,
Munmun De Choudhury
,
Srijan Kumar
,
Nishanth Sastry
· 1 min read
Abstract
Large language models are increasingly used for consumer healthcare queries, but their responses can contain subtle hallucinations with serious implications for patient safety. We introduce MedHalu, a benchmark for studying hallucinations in LLM responses to healthcare queries, with fine-grained annotations of hallucination types and severity. We analyze hallucination patterns across LLMs, query types, and medical specialties, and discuss implications for safer deployment in consumer healthcare.
Type
Publication
International AAAI Conference on Web and Social Media (ICWSM) 2026

Abstract

Large language models are increasingly used for consumer healthcare queries, but their responses can contain subtle hallucinations with serious implications for patient safety. We introduce MedHalu, a benchmark for studying hallucinations in LLM responses to healthcare queries, with fine-grained annotations of hallucination types and severity.

Yiqiao Jin
Authors
Ph.D. Candidate in Computer Science
My research focuses on adaptive and efficient AI systems, with emphasis on LLM agents, agent memory, self-distillation, multimodal LLMs, and structured multi-agent intelligence.