UniSD: Towards a Unified Self-Distillation Framework for Large Language Models
May 30, 2026·
,,,,,,,,,·
1 min read
Yiqiao Jin
Yiyang Wang
Lucheng Fu
Yijia Xiao
Yinyi Luo
Haoxin Liu
B. Aditya Prakash
Josiah Hester
Jindong Wang
Srijan Kumar
Abstract
Self-distillation has emerged as a powerful technique for improving large language models without external teacher signals, but existing approaches are fragmented across diverse objectives, training signals, and model components. We introduce UniSD, a unified self-distillation framework that consolidates these directions into a single, modular formulation. UniSD enables systematic comparison of self-distillation variants and supports new combinations across data, representation, and decoding levels, providing a principled foundation for efficient and adaptive LLM training.
Type
Publication
Under Review at NeurIPS 2026 (Preprint)
Abstract
Self-distillation has emerged as a powerful technique for improving large language models without external teacher signals, but existing approaches are fragmented across diverse objectives, training signals, and model components. We introduce UniSD, a unified self-distillation framework that consolidates these directions into a single, modular formulation. UniSD enables systematic comparison of self-distillation variants and supports new combinations across data, representation, and decoding levels.