Consistency Should Be the Priority for Unified Multimodal Models

Tue, 03 Feb 2026 00:00:00 +0000

Abstract

Unified multimodal models (UMMs) aim to handle understanding and generation across modalities within a single architecture. Despite rapid progress, current UMMs frequently produce inconsistent outputs across views, modalities, and prompts. In this position paper, we argue that consistency, not capability, should be the priority research target for UMMs.

CultureVLM: Characterizing and Improving Cultural Understanding of Vision-Language Models for over 100 Countries

Wed, 11 Jun 2025 00:00:00 +0000

Abstract

Vision-language models (VLMs) are deployed globally but exhibit substantial cultural blind spots. We introduce CultureVLM, a framework that characterizes and improves cultural understanding of VLMs across more than 100 countries.

Multimodal Models | Yiqiao Jin CS PhD @ Georgia Tech

Consistency Should Be the Priority for Unified Multimodal Models

Abstract

Links

CultureVLM: Characterizing and Improving Cultural Understanding of Vision-Language Models for over 100 Countries

Abstract

Links