Hierarchical Local-Global Transformer for Temporal Sentence Grounding 文章

ArXiv CS.CV2026-05-26NEWSen作者: Xiang Fang, Daizong Liu, Pan Zhou, Zichuan Xu, Ruixuan Li

摘要

arXiv:2208.14882v2 Announce Type: replace-cross Abstract: This paper studies the multimedia problem of temporal sentence grounding (TSG), which aims to accurately determine the specific video segment in an untrimmed video according to a given sentence query. Traditional TSG methods mainly follow the top-down or bottom-up framework and are not end-to-end. They severely rely on time-consuming post-processing to refine the grounding results. Recently, some transformer-based approaches are proposed to efficiently and effectively model the fine-grained semantic alignment between video and query. Although these methods achieve significant performance to some extent, they equally take frames of the video and words of the query as transformer input for correlating, failing to capture their different levels of granularity with distinct semantics.

Hierarchical Local-Global Transformer for Temporal Sentence Grounding 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (2)