Efficient AI

SARA: Selective and Adaptive Retrieval-augmented Generation with Context Compression

SARA is a unified RAG framework that balances local factual precision with global coverage by combining natural-language spans with compact semantic compression vectors, achieving consistent gains under strict context budgets.

Jul 1, 2026

UniSD: Towards a Unified Self-Distillation Framework for Large Language Models

UniSD unifies the fragmented landscape of self-distillation for large language models, providing a principled framework that supports systematic comparison and new combinations across data, representation, and decoding levels.

May 30, 2026

UniSD

A unified self-distillation framework for large language models that consolidates fragmented self-distillation directions into a single, modular formulation across data, representation, and decoding levels. Under review at NeurIPS 2026.

May 30, 2026

SARA

Selective and Adaptive Retrieval-augmented Generation with Context Compression. A unified RAG framework that combines fine-grained natural-language spans with compact semantic compression vectors under strict context budgets. Accepted at ACL 2026 main conference.

Mar 15, 2026

AgentArk: Distilling Multi-Agent Intelligence into a Single LLM Agent

AgentArk distills the collaborative behavior of multi-agent systems into a single LLM agent, decomposing trajectories into role-conditioned skills and recovering most of the ensemble's performance at a fraction of the cost.

Feb 4, 2026

Efficient Knowledge Probing of Large Language Models by Adapting Pre-trained Embeddings

An efficient approach to probing LLM knowledge that adapts pre-trained embeddings to query model knowledge with substantially reduced compute.

Feb 1, 2026

A Survey on Efficient LLM Training: From Data-centric Perspectives

A survey of efficient LLM training organized around data-centric techniques — selection, mixing, ordering, and synthesis — and their trade-offs with compute and downstream performance.

Jul 31, 2025