TraRA: Trajectory-level Recognition Aggregation for Video Text Spotting in Urban Surveillance 文章

ArXiv CS.CV2026-06-08NEWSen作者: Duc Tri Tran, Trung Thanh Nguyen, Vijay John, Phi Le Nguyen, Yasutomo Kawanishi

详细信息

来源站点
ArXiv CS.CV
作者
Duc Tri Tran, Trung Thanh Nguyen, Vijay John, Phi Le Nguyen, Yasutomo Kawanishi
文章类型
NEWS
语言
en
发布日期
2026-06-08

摘要

arXiv:2606.07161v1 Announce Type: new Abstract: Video Text Spotting (VTS) is essential for urban surveillance and intelligent transportation systems, enabling automated reading of street signs, vehicle markings, and scene text in video streams. However, reliable recognition remains challenging due to dynamic video factors common in surveillance scenarios, including motion blur, occlusion, and scale variation, which degrade frame-level recognition. Existing VTS methods typically perform recognition independently on each frame, leading to inconsistent and inaccurate results across sequences. To address these limitations, we propose TraRA (Trajectory-level Recognition Aggregation for VTS), a plug-and-play method that performs trajectory-level text recognition by leveraging temporal and multimodal consistency. TraRA integrates two key modules: (1) the Temporal Clustering and (2) the Vision-Language Aggregation.

相关事件

暂无数据

相关公司

暂无数据

相关人物

暂无数据