Sysformer: Safeguarding Frozen Large Language Models with Adaptive System Prompts
Apr 23, 2026·
,,,,,·
1 min read
Kartik Sharma
Yiqiao Jin
Vineeth Rakesh
Yingtong Dou
Menghai Pan
Mahashweta Das
Srijan Kumar
Abstract
Aligning frozen large language models without modifying their weights is a key challenge for safe and adaptive deployment. We introduce Sysformer, a system that learns adaptive system prompts to safeguard frozen LLMs across diverse risk scenarios. Sysformer treats the system prompt as a learnable, query-conditioned intervention, enabling fine-grained safety control without parameter updates and improving robustness across multiple LLM families. ICLR 2026 acceptance rate: 28%.
Type
Publication
International Conference on Learning Representations (ICLR) 2026
Abstract
Aligning frozen large language models without modifying their weights is a key challenge for safe and adaptive deployment. We introduce Sysformer, a system that learns adaptive system prompts to safeguard frozen LLMs across diverse risk scenarios. Sysformer treats the system prompt as a learnable, query-conditioned intervention, enabling fine-grained safety control without parameter updates and improving robustness across multiple LLM families.