SARA: Selective and Adaptive Retrieval-augmented Generation with Context Compression

Jul 1, 2026·

Yiqiao Jin

Kartik Sharma

Vineeth Rakesh

Yingtong Dou

Menghai Pan

Mahashweta Das

Srijan Kumar

· 1 min read

PDF Cite

Abstract

Retrieval-augmented Generation (RAG) extends large language models with external knowledge, but balancing local factual precision with global knowledge coverage under strict context budgets remains a fundamental challenge. We propose SARA, a unified RAG framework that combines fine-grained natural-language spans with compact, interpretable semantic compression vectors. SARA introduces an iterative context refinement mechanism that uses compression vectors for dynamic reranking, reducing document redundancy while maximizing query informativeness. Across multiple datasets and open-source LLM families (Mistral, Llama, Gemma), SARA delivers consistent performance gains over strong RAG baselines while operating under tight context budgets. ACL 2026 acceptance rate: 19.0%.

Type

Conference paper

Publication

Annual Meeting of the Association for Computational Linguistics (ACL) 2026, Main Conference

SARA: Selective and Adaptive Retrieval-augmented Generation with Context Compression

Abstract

Links