<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Multimodal Models | Yiqiao Jin CS PhD @ Georgia Tech</title><link>https://ahren09.github.io/tags/multimodal-models/</link><atom:link href="https://ahren09.github.io/tags/multimodal-models/index.xml" rel="self" type="application/rss+xml"/><description>Multimodal Models</description><generator>Hugo Blox Builder (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Tue, 03 Feb 2026 00:00:00 +0000</lastBuildDate><image><url>https://ahren09.github.io/media/icon_hu_eee6347cbdb2cc3f.png</url><title>Multimodal Models</title><link>https://ahren09.github.io/tags/multimodal-models/</link></image><item><title>Consistency Should Be the Priority for Unified Multimodal Models</title><link>https://ahren09.github.io/publication/icml26_consistency_umm/</link><pubDate>Tue, 03 Feb 2026 00:00:00 +0000</pubDate><guid>https://ahren09.github.io/publication/icml26_consistency_umm/</guid><description>&lt;h2 id="abstract">Abstract&lt;/h2>
&lt;p>Unified multimodal models (UMMs) aim to handle understanding and generation across modalities within a single architecture. Despite rapid progress, current UMMs frequently produce inconsistent outputs across views, modalities, and prompts. In this position paper, we argue that consistency, not capability, should be the priority research target for UMMs.&lt;/p>
&lt;h2 id="links">Links&lt;/h2>
&lt;ul>
&lt;li>
&lt;/li>
&lt;/ul></description></item><item><title>CultureVLM: Characterizing and Improving Cultural Understanding of Vision-Language Models for over 100 Countries</title><link>https://ahren09.github.io/publication/cvpr25_culturevlm/</link><pubDate>Wed, 11 Jun 2025 00:00:00 +0000</pubDate><guid>https://ahren09.github.io/publication/cvpr25_culturevlm/</guid><description>&lt;h2 id="abstract">Abstract&lt;/h2>
&lt;p>Vision-language models (VLMs) are deployed globally but exhibit substantial cultural blind spots. We introduce CultureVLM, a framework that characterizes and improves cultural understanding of VLMs across more than 100 countries.&lt;/p>
&lt;h2 id="links">Links&lt;/h2>
&lt;ul>
&lt;li>
&lt;/li>
&lt;/ul></description></item></channel></rss>