Large Language Models

SlideAgent: Hierarchical Agentic Framework for Multi-Page Visual Document Understanding

We introduce SlideAgent, a versatile agentic framework for understanding multi-modal, multi-page, and multi-layout documents.

Nov 27, 2025

SlideAgent: Hierarchical Agentic Framework for Multi-Page Visual Document Understanding

Multi-page visual documents such as manuals, brochures, presentations, and posters convey key information through layout, colors, icons, and cross-slide references. While large language models (LLMs) offer opportunities in document understanding, current systems struggle with complex, multi-page visual documents, particularly in fine-grained reasoning over elements and pages. We introduce SlideAgent, a versatile agentic framework for understanding multi-modal, multi-page, and multi-layout documents, especially slide decks. SlideAgent employs specialized agents and decomposes reasoning into three specialized levels--global, page, and element--to construct a structured, query-agnostic representation that captures both overarching themes and detailed visual or textual cues. During inference, SlideAgent selectively activates specialized agents for multi-level reasoning and integrates their outputs into coherent, context-aware answers. Extensive experiments show that SlideAgent achieves significant improvement over both proprietary (+7.9 over GPT-4o) and open-source models (+9.8 over InternVL3-8B).

Sep 26, 2025

Welcome to My Research Journey

An introduction to my research interests and recent work in Large Language Models, Multimodal Learning, and Social Computing.

Aug 15, 2025

Developing LLM Systems for Social Application

A framework and benchmark to evaluate LLMs multilingual capabilities in healthcare queries, revealing significant performance gaps across languages.

Dec 10, 2024

AgentReview: Exploring Peer Review Dynamics with LLM Agents

Peer review is fundamental to the integrity and advancement of scientific publication. Traditional methods of peer review analyses often rely on exploration and statistics of existing peer review data, which do not adequately address the multivariate nature of the process, account for the latent variables, and are further constrained by privacy concerns due to the sensitive nature of the data. We introduce AgentReview, the first large language model (LLM) based peer review simulation framework, which effectively disentangles the impacts of multiple latent factors and addresses the privacy issue. Our study reveals significant insights, including a notable 37.1% variation in paper decisions due to reviewers' biases, supported by sociological theories such as the social influence theory, altruism fatigue, and authority bias. We believe that this study could offer valuable insights to improve the design of peer review mechanisms. Our code is available at https://github.com/Ahren09/AgentReview.

Nov 12, 2024

AgentReview: Exploring Peer Review Dynamics with LLM Agents

Nov 12, 2024

CompeteAI: Understanding the Competition Dynamics of Large Language Model-based Agents

This work studies the competition dynamics among LLM-based agents, revealing emergent behaviors and strategic patterns in multi-agent systems....

Apr 30, 2024

Prototypical Reward Network for Data-Efficient RLHF

We propose a prototypical reward network that enables data-efficient reinforcement learning from human feedback (RLHF) for large language models....

Jan 1, 2024

MM-SOC: Benchmarking Multimodal Large Language Models in Social Media Platforms

Social media platforms are hubs for multimodal information exchange, encompassing text, images, and videos, making it challenging for machines to comprehensively understand the information. Multimodal...

Jan 1, 2024

Better to Ask in English: Cross-Lingual Evaluation of Large Language Models for Healthcare Queries

We present a framework and benchmark to evaluate LLMs' multilingual capabilities in healthcare queries, revealing significant performance gaps across languages and providing insights for improving hea...

Jan 1, 2024