Healthcare

MedHalu: Hallucinations in Responses to Healthcare Queries by Large Language Models

MedHalu is a fine-grained benchmark for studying hallucinations in LLM responses to consumer healthcare queries, analyzing hallucination patterns across models, query types, and medical specialties.

Jun 1, 2026

XLingEval

Cross-lingual evaluation framework that exposes substantial multilingual gaps in LLM healthcare responses. Featured by Scientific American, The World, and Georgia Tech News. WebConf 2024 Oral.

Apr 1, 2024

Better to Ask in English: Cross-Lingual Evaluation of Large Language Models for Healthcare Queries

We present a framework and benchmark to evaluate LLMs' multilingual capabilities in healthcare queries, revealing significant performance gaps across languages and providing insights for improving hea...

Jan 1, 2024