Towards One-to-Many Temporal Grounding 事件

BREAKTHROUGH2026-06-05影响: HIGH

Towards One-to-Many Temporal Grounding arXiv:2606.06294v1 Announce Type: new Abstract: Temporal Grounding (TG) aims to localize video segments corresponding to a textual query. Prior research predominantly focuses on single-segment retrieval. Real-world scenarios, however, often require localizing multiple disjoint segments for a single query -- a setting we term One-to-Many Temporal Grounding (OMTG). Previous state-of-the-art MLLMs, optimized for one-to-one settings, struggle in this context,