Attend to Anything: Foundation Model for Unified Human Attention Modeling 文章

ArXiv CS.CV2026-06-03NEWSen作者: Wenzhuo Zhao, Ronghao Xian, Keren Fu, Qijun Zhao

摘要

arXiv:2606.03540v1 Announce Type: new Abstract: Existing human attention (saliency) modeling methods persist as highly fragmented across modalities, scenes, and task formulations. Consequently, even with increasing model capacity and data scale, current models predominantly remain scene-dependent and task-specific, failing to practically generalize in real-world applications. To address the fundamental limitations, we present the Attend to Anything Model (AAM), a multi-modal foundation model that unifies attention modeling across various image, video, and audio-visual tasks and scenes. AAM reformulates attention as a cognitive entailment relationship organized in a general-to-specific hierarchy, implemented through language prompts with hierarchical embeddings in hyperbolic space.

相关事件

暂无数据

相关公司

暂无数据

相关人物

暂无数据

相关技术

暂无数据