ScreenLLM: Stateful Screen Schema for Efficient Action Understanding and Prediction

Wed, 30 Apr 2025 00:00:00 +0000

Abstract

We introduce ScreenLLM, a specialized multimodal LLM for Graphical User Interface (GUI) understanding and action prediction. ScreenLLM proposes a stateful screen schema that represents dynamic user sessions as compact, time-aware textual summaries, and a high-efficiency key-frame extraction method based on second-order pixel changes to isolate significant UI transitions.

GUI Understanding | Yiqiao Jin CS PhD @ Georgia Tech

ScreenLLM: Stateful Screen Schema for Efficient Action Understanding and Prediction

Abstract

Links