OmniInteract: Benchmarking Real-World Streaming Interaction for Real-Time Omnimodal Assistants 文章

ArXiv CS.CV2026-05-27NEWSen作者: Xudong Lu, Xueying Li, Annan Wang, Yang Bo, Jinpeng Chen, Zengliang Li, Nianzu Yang, Rui Liu, Xue Yang, Jingwen Hou, Hongsheng Li

摘要

arXiv:2605.26485v1 Announce Type: new Abstract: We introduce OmniInteract, a streaming benchmark for real-time omnimodal large language models evaluated through native online inference over audio-visual streams. Unlike offline video understanding or text-prompted streaming QA, OmniInteract preserves the original audio-visual stream and requires models to process it online, without access to future content. User queries and ambient sounds are embedded in the audio track, requiring models to detect multimodal triggers, decide when to respond, and answer while the stream unfolds. OmniInteract contains 250 videos with 1,430 temporally grounded response slots: 1,062 1Q1A slots across real-time, proactive, and nested scenarios, and 368 1QnA slots for continuous task monitoring and step guidance. Each slot includes a trigger, response window, and target answer.

OmniInteract: Benchmarking Real-World Streaming Interaction for Real-Time Omnimodal Assistants 文章

摘要

相关事件查看全部 (1)

相关公司查看全部 (6)

相关人物

相关产品查看全部 (5)

相关技术查看全部 (22)