UniGuard: Towards Universal Safety Guardrails for Jailbreak Attacks on Multimodal Large Language Models

Mar 3, 2025·
Sejoon Oh
Yiqiao Jin
Yiqiao Jin
,
Megha Sharma
,
Donghyun Kim
,
Gaurav Verma
,
Eric Ma
,
Srijan Kumar
· 1 min read
Abstract
Multimodal LLMs are vulnerable to jailbreak attacks that exploit cross-modal interactions. We introduce UniGuard, a universal safety guardrail framework that defends multimodal LLMs against jailbreak attacks across image and text channels, providing strong resistance with low utility cost.
Type
Publication
AAAI 2025 Workshop on Deployable AI (DAI)

Abstract

Multimodal LLMs are vulnerable to jailbreak attacks that exploit cross-modal interactions. We introduce UniGuard, a universal safety guardrail framework that defends multimodal LLMs against jailbreak attacks across image and text channels.

Yiqiao Jin
Authors
Ph.D. Candidate in Computer Science
My research focuses on adaptive and efficient AI systems, with emphasis on LLM agents, agent memory, self-distillation, multimodal LLMs, and structured multi-agent intelligence.