Protein Large Language Models: A Comprehensive Survey

Nov 1, 2025·
Yijia Xiao
,
Wanjia Zhao
,
Junkai Zhang
Yiqiao Jin
Yiqiao Jin
,
Han Zhang
,
Zhicheng Ren
,
Renliang Sun
,
Haixin Wang
,
Guancheng Wan
,
Pan Lu
,
Xiao Luo
,
Yu Zhang
,
James Zou
,
Yizhou Sun
,
Wei Wang
· 1 min read
Abstract
Protein large language models (Protein LLMs) have rapidly emerged as a transformative paradigm for protein understanding, generation, and design. This survey provides a comprehensive overview of Protein LLMs, organizing the field along architectures, training objectives, datasets, downstream tasks, and applications across biology, chemistry, and medicine. We discuss key challenges, open problems, and future research directions, and provide a unified taxonomy for navigating this rapidly evolving area. EMNLP 2025 acceptance rate: 22.2%.
Type
Publication
Conference on Empirical Methods in Natural Language Processing (EMNLP) 2025

Abstract

Protein large language models (Protein LLMs) have rapidly emerged as a transformative paradigm for protein understanding, generation, and design. This survey provides a comprehensive overview of Protein LLMs, organizing the field along architectures, training objectives, datasets, downstream tasks, and applications across biology, chemistry, and medicine.

Yiqiao Jin
Authors
Ph.D. Candidate in Computer Science
My research focuses on adaptive and efficient AI systems, with emphasis on LLM agents, agent memory, self-distillation, multimodal LLMs, and structured multi-agent intelligence.