MM-BizRAG: Rethinking Multimodal Retrieval-Augmented Generation for General Purpose Enterprise Q&A

Jul 1, 2026·
Hanoz Bhathena
,
Parin Rajesh Jhaveri
,
Rohan Mittal
,
Prateek Singh
,
Aymen Kallala
,
Rachneet Kaur
Yiqiao Jin
Yiqiao Jin
,
Zhen Zeng
,
Adwait Ratnaparkhi
,
Denis Kochedykov
· 1 min read
Abstract
Enterprise question answering frequently spans heterogeneous modalities — text, tables, charts, scanned documents, and structured databases. We introduce MM-BizRAG, a multimodal retrieval-augmented generation framework designed for general purpose enterprise Q&A. MM-BizRAG combines modality-aware retrieval, structured-context fusion, and grounded generation, and is evaluated on enterprise-realistic workloads.
Type
Publication
Annual Meeting of the Association for Computational Linguistics (ACL) 2026, Industry Track

Abstract

Enterprise question answering frequently spans heterogeneous modalities — text, tables, charts, scanned documents, and structured databases. We introduce MM-BizRAG, a multimodal retrieval-augmented generation framework designed for general purpose enterprise Q&A.

Yiqiao Jin
Authors
Ph.D. Candidate in Computer Science
My research focuses on adaptive and efficient AI systems, with emphasis on LLM agents, agent memory, self-distillation, multimodal LLMs, and structured multi-agent intelligence.