SARA: Selective and Adaptive Retrieval-augmented Generation with Context Compression
Jul 1, 2026·
,,,,,,·
1 min read
Yiqiao Jin
Kartik Sharma
Vineeth Rakesh
Yingtong Dou
Menghai Pan
Mahashweta Das
Srijan Kumar
Abstract
Retrieval-augmented Generation (RAG) extends large language models with external knowledge, but balancing local factual precision with global knowledge coverage under strict context budgets remains a fundamental challenge. We propose SARA, a unified RAG framework that combines fine-grained natural-language spans with compact, interpretable semantic compression vectors. SARA introduces an iterative context refinement mechanism that uses compression vectors for dynamic reranking, reducing document redundancy while maximizing query informativeness. Across multiple datasets and open-source LLM families (Mistral, Llama, Gemma), SARA delivers consistent performance gains over strong RAG baselines while operating under tight context budgets. ACL 2026 acceptance rate: 19.0%.
Type
Publication
Annual Meeting of the Association for Computational Linguistics (ACL) 2026, Main Conference
Abstract
Retrieval-augmented Generation (RAG) extends large language models with external knowledge, but balancing local factual precision with global knowledge coverage under strict context budgets remains a fundamental challenge. We propose SARA, a unified RAG framework that combines fine-grained natural-language spans with compact, interpretable semantic compression vectors. SARA introduces an iterative context refinement mechanism that uses compression vectors for dynamic reranking, reducing document redundancy while maximizing query informativeness. Across multiple datasets and open-source LLM families (Mistral, Llama, Gemma), SARA delivers consistent performance gains over strong RAG baselines.