Efficient AI | Yiqiao Jin CS PhD @ Georgia Tech

SARA: Selective and Adaptive Retrieval-augmented Generation with Context Compression

Wed, 01 Jul 2026 00:00:00 +0000

Abstract

Retrieval-augmented Generation (RAG) extends large language models with external knowledge, but balancing local factual precision with global knowledge coverage under strict context budgets remains a fundamental challenge. We propose SARA, a unified RAG framework that combines fine-grained natural-language spans with compact, interpretable semantic compression vectors. SARA introduces an iterative context refinement mechanism that uses compression vectors for dynamic reranking, reducing document redundancy while maximizing query informativeness. Across multiple datasets and open-source LLM families (Mistral, Llama, Gemma), SARA delivers consistent performance gains over strong RAG baselines.

UniSD: Towards a Unified Self-Distillation Framework for Large Language Models

Sat, 30 May 2026 00:00:00 +0000

Abstract

Self-distillation has emerged as a powerful technique for improving large language models without external teacher signals, but existing approaches are fragmented across diverse objectives, training signals, and model components. We introduce UniSD, a unified self-distillation framework that consolidates these directions into a single, modular formulation. UniSD enables systematic comparison of self-distillation variants and supports new combinations across data, representation, and decoding levels.

AgentArk: Distilling Multi-Agent Intelligence into a Single LLM Agent

Wed, 04 Feb 2026 00:00:00 +0000

Abstract

Multi-agent systems achieve strong performance on complex tasks by orchestrating diverse roles, planners, and tool-using agents. However, deploying full multi-agent stacks is expensive and brittle. We introduce AgentArk, a distillation framework that compresses multi-agent intelligence into a single LLM agent. AgentArk decomposes multi-agent trajectories into role-conditioned skills and trains a single agent to reproduce the collaborative behavior of the original ensemble.

Efficient Knowledge Probing of Large Language Models by Adapting Pre-trained Embeddings

Sun, 01 Feb 2026 00:00:00 +0000

Abstract

Probing what a large language model knows is essential for safe deployment, but exhaustive probing is prohibitively expensive. We propose an efficient knowledge probing approach that adapts pre-trained embeddings to query LLM knowledge with substantially reduced compute, while preserving the fidelity of standard probing protocols.

A Survey on Efficient LLM Training: From Data-centric Perspectives

Thu, 31 Jul 2025 00:00:00 +0000

Abstract

Efficient training of large language models has become a central concern as model and data scales grow. This survey reviews efficient LLM training from a data-centric perspective, organizing techniques around data selection, mixing, ordering, and synthesis.

Efficient AI | Yiqiao Jin CS PhD @ Georgia Tech

SARA: Selective and Adaptive Retrieval-augmented Generation with Context Compression

Abstract

Links

UniSD: Towards a Unified Self-Distillation Framework for Large Language Models

Abstract

Links

AgentArk: Distilling Multi-Agent Intelligence into a Single LLM Agent

Abstract

Links

Efficient Knowledge Probing of Large Language Models by Adapting Pre-trained Embeddings

Abstract

Links

A Survey on Efficient LLM Training: From Data-centric Perspectives

Abstract

Links