UniSD: Towards a Unified Self-Distillation Framework for Large Language Models

May 30, 2026·
Yiqiao Jin
Yiqiao Jin
,
Yiyang Wang
,
Lucheng Fu
,
Yijia Xiao
,
Yinyi Luo
,
Haoxin Liu
,
B. Aditya Prakash
,
Josiah Hester
,
Jindong Wang
,
Srijan Kumar
· 1 min read
Abstract
Self-distillation has emerged as a powerful technique for improving large language models without external teacher signals, but existing approaches are fragmented across diverse objectives, training signals, and model components. We introduce UniSD, a unified self-distillation framework that consolidates these directions into a single, modular formulation. UniSD enables systematic comparison of self-distillation variants and supports new combinations across data, representation, and decoding levels, providing a principled foundation for efficient and adaptive LLM training.
Type
Publication
Under Review at NeurIPS 2026 (Preprint)

Abstract

Self-distillation has emerged as a powerful technique for improving large language models without external teacher signals, but existing approaches are fragmented across diverse objectives, training signals, and model components. We introduce UniSD, a unified self-distillation framework that consolidates these directions into a single, modular formulation. UniSD enables systematic comparison of self-distillation variants and supports new combinations across data, representation, and decoding levels.

Yiqiao Jin
Authors
Ph.D. Candidate in Computer Science
My research focuses on adaptive and efficient AI systems, with emphasis on LLM agents, agent memory, self-distillation, multimodal LLMs, and structured multi-agent intelligence.