Overall Framework
SARA addresses key challenges in RAG through a hybrid compression strategy that balances local precision and global knowledge coverage. The framework operates through a two-stage training procedure:
🎯 Stage 1: Compression Learning
Lightweight compressor aligns embeddings with LLM token space
Learns to reconstruct original contexts from vectors
Progressive training on complex text chunks
⚡ Stage 2: Instruction-tuning & Inference
Top-k passages in natural language + compressed vectors
Iterative selection for relevance and diversity
Embedding novelty + CSI scoring
The framework represents contexts at two complementary levels: 1) fine-grained natural-language spans that preserve critical entities and numerical values, and 2) compact, interpretable vectors that summarize high-level semantics. An iterative evidence-selection module employs compression vectors for dynamic reranking of contexts, ensuring optimal information density within strict context budgets.