UniGuard: Towards Universal Safety Guardrails for Jailbreak Attacks on Multimodal Large Language Models

Mar 3, 2025·

Sejoon Oh

Yiqiao Jin

,

Megha Sharma

,

Donghyun Kim

,

Gaurav Verma

,

Eric Ma

,

Srijan Kumar

· 1 min read

PDF

Abstract

Multimodal LLMs are vulnerable to jailbreak attacks that exploit cross-modal interactions. We introduce UniGuard, a universal safety guardrail framework that defends multimodal LLMs against jailbreak attacks across image and text channels, providing strong resistance with low utility cost.

Type

Conference paper

Publication

AAAI 2025 Workshop on Deployable AI (DAI)

Abstract

Multimodal LLMs are vulnerable to jailbreak attacks that exploit cross-modal interactions. We introduce UniGuard, a universal safety guardrail framework that defends multimodal LLMs against jailbreak attacks across image and text channels.

Links

arXiv

Last updated on Mar 3, 2025

LLM Safety Multimodal LLMs Jailbreak Defense

Authors

Yiqiao Jin

Ph.D. Candidate in Computer Science

My research focuses on adaptive and efficient AI systems, with emphasis on LLM agents, agent memory, self-distillation, multimodal LLMs, and structured multi-agent intelligence.

← SciEvo: A 2 Million, 30-Year Cross-disciplinary Dataset for Temporal Scientometric Analysis Mar 3, 2025

RNA-GPT: Multimodal Generative System for RNA Sequence Understanding Dec 13, 2024 →