SARA: Selective and Adaptive Retrieval-augmented Generation with Context Compression

Wed, 01 Jul 2026 00:00:00 +0000

Abstract

Retrieval-augmented Generation (RAG) extends large language models with external knowledge, but balancing local factual precision with global knowledge coverage under strict context budgets remains a fundamental challenge. We propose SARA, a unified RAG framework that combines fine-grained natural-language spans with compact, interpretable semantic compression vectors. SARA introduces an iterative context refinement mechanism that uses compression vectors for dynamic reranking, reducing document redundancy while maximizing query informativeness. Across multiple datasets and open-source LLM families (Mistral, Llama, Gemma), SARA delivers consistent performance gains over strong RAG baselines.

Context Compression | Yiqiao Jin CS PhD @ Georgia Tech

SARA: Selective and Adaptive Retrieval-augmented Generation with Context Compression

Abstract

Links