ScreenLLM: Stateful Screen Schema for Efficient Action Understanding and Prediction
ScreenLLM introduces a stateful screen schema and key-frame extractor that compresses dynamic UI sessions into time-aware summaries, enabling efficient GUI understanding and action prediction with multimodal LLMs.
Apr 30, 2025