LLM Safety | Yiqiao Jin CS PhD @ Georgia Tech

Reasoning Is Not All You Need: Examining LLMs for Multi-Turn Mental Health Conversations

Wed, 01 Jul 2026 00:00:00 +0000

Abstract

Mental health support over multi-turn conversations stresses LLM reasoning, empathy, and safety in distinct ways. We systematically examine the limits of reasoning-focused LLMs in multi-turn mental health conversations, isolating failure modes that pure reasoning cannot address.

MedHalu: Hallucinations in Responses to Healthcare Queries by Large Language Models

Mon, 01 Jun 2026 00:00:00 +0000

Abstract

Large language models are increasingly used for consumer healthcare queries, but their responses can contain subtle hallucinations with serious implications for patient safety. We introduce MedHalu, a benchmark for studying hallucinations in LLM responses to healthcare queries, with fine-grained annotations of hallucination types and severity.

Sysformer: Safeguarding Frozen Large Language Models with Adaptive System Prompts

Thu, 23 Apr 2026 00:00:00 +0000

Abstract

Aligning frozen large language models without modifying their weights is a key challenge for safe and adaptive deployment. We introduce Sysformer, a system that learns adaptive system prompts to safeguard frozen LLMs across diverse risk scenarios. Sysformer treats the system prompt as a learnable, query-conditioned intervention, enabling fine-grained safety control without parameter updates and improving robustness across multiple LLM families.

UniGuard: Towards Universal Safety Guardrails for Jailbreak Attacks on Multimodal Large Language Models

Mon, 03 Mar 2025 00:00:00 +0000

Abstract

Multimodal LLMs are vulnerable to jailbreak attacks that exploit cross-modal interactions. We introduce UniGuard, a universal safety guardrail framework that defends multimodal LLMs against jailbreak attacks across image and text channels.

PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners

Tue, 12 Nov 2024 00:00:00 +0000

Abstract

Deploying LLMs on private text data raises serious contextual privacy concerns. We introduce PrivacyMind, which teaches LLMs to be contextual privacy protection learners — recognizing sensitive content in context and adapting their outputs accordingly.

LLM Safety | Yiqiao Jin CS PhD @ Georgia Tech

Reasoning Is Not All You Need: Examining LLMs for Multi-Turn Mental Health Conversations

Abstract

Links

MedHalu: Hallucinations in Responses to Healthcare Queries by Large Language Models

Abstract

Links

Sysformer: Safeguarding Frozen Large Language Models with Adaptive System Prompts

Abstract

Links

UniGuard: Towards Universal Safety Guardrails for Jailbreak Attacks on Multimodal Large Language Models

Abstract

Links

PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners

Abstract

Links