UniGuard: Towards Universal Safety Guardrails for Jailbreak Attacks on Multimodal Large Language Models
Mar 3, 2025·
,,,,,·
1 min read
Sejoon Oh
Yiqiao Jin
Megha Sharma
Donghyun Kim
Gaurav Verma
Eric Ma
Srijan Kumar
Abstract
Multimodal LLMs are vulnerable to jailbreak attacks that exploit cross-modal interactions. We introduce UniGuard, a universal safety guardrail framework that defends multimodal LLMs against jailbreak attacks across image and text channels, providing strong resistance with low utility cost.
Type
Publication
AAAI 2025 Workshop on Deployable AI (DAI)
Abstract
Multimodal LLMs are vulnerable to jailbreak attacks that exploit cross-modal interactions. We introduce UniGuard, a universal safety guardrail framework that defends multimodal LLMs against jailbreak attacks across image and text channels.