Protein Large Language Models: A Comprehensive Survey

Nov 1, 2025·

Yijia Xiao

Wanjia Zhao

Junkai Zhang

Yiqiao Jin

Han Zhang

Zhicheng Ren

Renliang Sun

Haixin Wang

Guancheng Wan

Pan Lu

Xiao Luo

Yu Zhang

James Zou

Yizhou Sun

Wei Wang

· 1 min read

PDF

Abstract

Protein large language models (Protein LLMs) have rapidly emerged as a transformative paradigm for protein understanding, generation, and design. This survey provides a comprehensive overview of Protein LLMs, organizing the field along architectures, training objectives, datasets, downstream tasks, and applications across biology, chemistry, and medicine. We discuss key challenges, open problems, and future research directions, and provide a unified taxonomy for navigating this rapidly evolving area. EMNLP 2025 acceptance rate: 22.2%.

Type

Conference paper

Publication

Conference on Empirical Methods in Natural Language Processing (EMNLP) 2025

Protein Large Language Models: A Comprehensive Survey

Abstract

Links