A Survey on Efficient LLM Training: From Data-centric Perspectives

Jul 31, 2025·

Junyu Luo

Bohan Wu

Xiao Luo

Zhiping Xiao

Yiqiao Jin

Rong-Cheng Tu

Nan Yin

Yifan Wang

Jingyang Yuan

Wei Ju

Ming Zhang

· 1 min read

PDF

Abstract

Efficient training of large language models has become a central concern as model and data scales grow. This survey reviews efficient LLM training from a data-centric perspective, organizing techniques around data selection, mixing, ordering, and synthesis. We discuss trade-offs between compute, data quality, and downstream performance, and identify open challenges in scaling data-centric efficiency to frontier LLMs.

Type

Conference paper

Publication

Annual Meeting of the Association for Computational Linguistics (ACL) 2025, Main Conference

A Survey on Efficient LLM Training: From Data-centric Perspectives

Abstract

Links