<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Efficient AI | Yiqiao Jin CS PhD @ Georgia Tech</title><link>https://ahren09.github.io/tags/efficient-ai/</link><atom:link href="https://ahren09.github.io/tags/efficient-ai/index.xml" rel="self" type="application/rss+xml"/><description>Efficient AI</description><generator>Hugo Blox Builder (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Wed, 01 Jul 2026 00:00:00 +0000</lastBuildDate><image><url>https://ahren09.github.io/media/icon_hu_eee6347cbdb2cc3f.png</url><title>Efficient AI</title><link>https://ahren09.github.io/tags/efficient-ai/</link></image><item><title>SARA: Selective and Adaptive Retrieval-augmented Generation with Context Compression</title><link>https://ahren09.github.io/publication/acl26_sara/</link><pubDate>Wed, 01 Jul 2026 00:00:00 +0000</pubDate><guid>https://ahren09.github.io/publication/acl26_sara/</guid><description>&lt;h2 id="abstract">Abstract&lt;/h2>
&lt;p>Retrieval-augmented Generation (RAG) extends large language models with external knowledge, but balancing local factual precision with global knowledge coverage under strict context budgets remains a fundamental challenge. We propose SARA, a unified RAG framework that combines fine-grained natural-language spans with compact, interpretable semantic compression vectors. SARA introduces an iterative context refinement mechanism that uses compression vectors for dynamic reranking, reducing document redundancy while maximizing query informativeness. Across multiple datasets and open-source LLM families (Mistral, Llama, Gemma), SARA delivers consistent performance gains over strong RAG baselines.&lt;/p>
&lt;h2 id="links">Links&lt;/h2>
&lt;ul>
&lt;li>
&lt;/li>
&lt;/ul></description></item><item><title>UniSD: Towards a Unified Self-Distillation Framework for Large Language Models</title><link>https://ahren09.github.io/publication/neurips26_unisd/</link><pubDate>Sat, 30 May 2026 00:00:00 +0000</pubDate><guid>https://ahren09.github.io/publication/neurips26_unisd/</guid><description>&lt;h2 id="abstract">Abstract&lt;/h2>
&lt;p>Self-distillation has emerged as a powerful technique for improving large language models without external teacher signals, but existing approaches are fragmented across diverse objectives, training signals, and model components. We introduce UniSD, a unified self-distillation framework that consolidates these directions into a single, modular formulation. UniSD enables systematic comparison of self-distillation variants and supports new combinations across data, representation, and decoding levels.&lt;/p>
&lt;h2 id="links">Links&lt;/h2>
&lt;ul>
&lt;li>
&lt;/li>
&lt;/ul></description></item><item><title>AgentArk: Distilling Multi-Agent Intelligence into a Single LLM Agent</title><link>https://ahren09.github.io/publication/neurips26_agentark/</link><pubDate>Wed, 04 Feb 2026 00:00:00 +0000</pubDate><guid>https://ahren09.github.io/publication/neurips26_agentark/</guid><description>&lt;h2 id="abstract">Abstract&lt;/h2>
&lt;p>Multi-agent systems achieve strong performance on complex tasks by orchestrating diverse roles, planners, and tool-using agents. However, deploying full multi-agent stacks is expensive and brittle. We introduce AgentArk, a distillation framework that compresses multi-agent intelligence into a single LLM agent. AgentArk decomposes multi-agent trajectories into role-conditioned skills and trains a single agent to reproduce the collaborative behavior of the original ensemble.&lt;/p>
&lt;h2 id="links">Links&lt;/h2>
&lt;ul>
&lt;li>
&lt;/li>
&lt;/ul></description></item><item><title>Efficient Knowledge Probing of Large Language Models by Adapting Pre-trained Embeddings</title><link>https://ahren09.github.io/publication/aaai26_knowledge_probing/</link><pubDate>Sun, 01 Feb 2026 00:00:00 +0000</pubDate><guid>https://ahren09.github.io/publication/aaai26_knowledge_probing/</guid><description>&lt;h2 id="abstract">Abstract&lt;/h2>
&lt;p>Probing what a large language model knows is essential for safe deployment, but exhaustive probing is prohibitively expensive. We propose an efficient knowledge probing approach that adapts pre-trained embeddings to query LLM knowledge with substantially reduced compute, while preserving the fidelity of standard probing protocols.&lt;/p>
&lt;h2 id="links">Links&lt;/h2>
&lt;ul>
&lt;li>
&lt;/li>
&lt;/ul></description></item><item><title>A Survey on Efficient LLM Training: From Data-centric Perspectives</title><link>https://ahren09.github.io/publication/acl25_efficient_llm_training/</link><pubDate>Thu, 31 Jul 2025 00:00:00 +0000</pubDate><guid>https://ahren09.github.io/publication/acl25_efficient_llm_training/</guid><description>&lt;h2 id="abstract">Abstract&lt;/h2>
&lt;p>Efficient training of large language models has become a central concern as model and data scales grow. This survey reviews efficient LLM training from a data-centric perspective, organizing techniques around data selection, mixing, ordering, and synthesis.&lt;/p>
&lt;h2 id="links">Links&lt;/h2>
&lt;ul>
&lt;li>
&lt;/li>
&lt;/ul></description></item></channel></rss>