SeqTrack: Sequence to Sequence Learning for Visual Object Tracking 论文

2023引用 380

Video Surveillance and Tracking MethodsFire Detection and Safety SystemsVisual Attention and Saliency Detection

Visual Attention and Saliency Detection Video Surveillance and Tracking Methods Fire Detection and Safety Systems

作者

摘要

In this paper, we present a new sequence-to-sequence learning framework for visual tracking, dubbed SeqTrack. It casts visual tracking as a sequence generation problem, which predicts object bounding boxes in an autoregressive fashion. This is different from prior Siamese trackers and transformer trackers, which rely on designing complicated head networks, such as classification and regression heads. SeqTrack only adopts a simple encoder-decoder transformer architecture. The encoder extracts visual features with a bidirectional transformer, while the decoder generates a sequence of bounding box values autoregressively with a causal transformer. The loss function is a plain cross-entropy. Such a sequence learning paradigm not only simplifies tracking framework, but also achieves competitive performance on benchmarks. For instance, SeqTrack gets 72.5% AUC on LaSOT, establishing a new state-of-the-art performance. Code and models are available at https://github.com/microsoft/VideoX.

作者查看全部 (4)

Xin Chen

Hu Han

Huchuan Lu

Houwen Peng

SeqTrack: Sequence to Sequence Learning for Visual Object Tracking 论文

详细信息

摘要

作者查看全部 (4)

相关技术查看全部 (3)

相关事件

相关文章