A systematic study showing that reasoning capabilities alone are insufficient for LLMs in multi-turn mental health conversations, isolating failure modes that demand additional safety and empathy-aware design.
Jul 1, 2026
MedHalu is a fine-grained benchmark for studying hallucinations in LLM responses to consumer healthcare queries, analyzing hallucination patterns across models, query types, and medical specialties.
Jun 1, 2026
Sysformer learns adaptive, query-conditioned system prompts to safeguard frozen large language models, providing fine-grained safety control without modifying model weights.
Apr 23, 2026
UniGuard is a universal safety guardrail for multimodal LLMs, defending against cross-modal jailbreak attacks across image and text channels with low utility cost.
Mar 3, 2025
PrivacyMind teaches LLMs to be contextual privacy protection learners that recognize sensitive content in context and adapt outputs accordingly, preserving utility while reducing leakage.
Nov 12, 2024