Paper-Conference

A Survey on Efficient LLM Training: From Data-centric Perspectives

A survey of efficient LLM training organized around data-centric techniques — selection, mixing, ordering, and synthesis — and their trade-offs with compute and downstream performance.

Jul 31, 2025

ProteinGPT: Multimodal LLM for Protein Property Prediction and Structure Understanding

ProteinGPT is a multimodal LLM that integrates protein sequence and structural representations in a unified generative interface for property prediction and structure understanding.

Jul 18, 2025

CultureVLM: Characterizing and Improving Cultural Understanding of Vision-Language Models for over 100 Countries
CultureVLM: Characterizing and Improving Cultural Understanding of Vision-Language Models for over 100 Countries

CultureVLM characterizes and improves cultural understanding of vision-language models across more than 100 countries using culturally-grounded benchmarks and training procedures.

Jun 11, 2025

ScreenLLM: Stateful Screen Schema for Efficient Action Understanding and Prediction

ScreenLLM introduces a stateful screen schema and key-frame extractor that compresses dynamic UI sessions into time-aware summaries, enabling efficient GUI understanding and action prediction with multimodal LLMs.

Apr 30, 2025

UniGuard: Towards Universal Safety Guardrails for Jailbreak Attacks on Multimodal Large Language Models

UniGuard is a universal safety guardrail for multimodal LLMs, defending against cross-modal jailbreak attacks across image and text channels with low utility cost.

Mar 3, 2025

SciEvo: A 2 Million, 30-Year Cross-disciplinary Dataset for Temporal Scientometric Analysis
SciEvo: A 2 Million, 30-Year Cross-disciplinary Dataset for Temporal Scientometric Analysis

SciEvo is a comprehensive dataset containing 2 million+ papers spanning 30 years (1995-2024) for temporal scientometric analysis.

Mar 3, 2025

RNA-GPT: Multimodal Generative System for RNA Sequence Understanding

RNA-GPT is a multimodal generative system that combines RNA sequence reasoning with structural cues for property prediction, retrieval, and natural-language interaction over RNA data.

Dec 13, 2024

PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners

PrivacyMind teaches LLMs to be contextual privacy protection learners that recognize sensitive content in context and adapt outputs accordingly, preserving utility while reducing leakage.

Nov 12, 2024

AgentReview: Exploring Peer Review Dynamics with LLM Agents
AgentReview: Exploring Peer Review Dynamics with LLM Agents

Peer review is fundamental to the integrity and advancement of scientific publication. Traditional methods of peer review analyses often rely on exploration and statistics of existing peer review data...

Nov 12, 2024

Towards Fair Graph Anomaly Detection: Problem, Benchmark Datasets, and Evaluation
Towards Fair Graph Anomaly Detection: Problem, Benchmark Datasets, and Evaluation

We address fairness issues in graph anomaly detection, providing benchmark datasets and comprehensive evaluation frameworks for fair anomaly detection on graphs....

Jul 4, 2024