SAGE: Segment-Aware Gloss-Free Encoding for Token-Efficient Sign Language Translation 文章

ArXiv CS.CV2026-05-29NEWSen作者: JianHe Low, Ozge Mercanoglu Sincan, Richard Bowden

详细信息

来源站点: ArXiv CS.CV
作者: JianHe Low, Ozge Mercanoglu Sincan, Richard Bowden
文章类型: NEWS
语言: en
发布日期: 2026-05-29

摘要

arXiv:2507.09266v2 Announce Type: replace Abstract: Gloss-free Sign Language Translation (SLT) has advanced rapidly, achieving strong performances without relying on gloss annotations. However, these gains have often come with increased model complexity and high computational demands, raising concerns about scalability, especially as large-scale sign language datasets become more common. We propose a segment-aware visual tokenization framework that leverages sign segmentation to convert continuous video into discrete, sign-informed visual tokens. This reduces input sequence length by up to 50% compared to prior methods, resulting in up to 2.67x lower memory usage and better scalability on larger datasets. To bridge the visual and linguistic modalities, we introduce a token-to-token contrastive alignment objective, along with a dual-level supervision that aligns both language embeddings and intermediate hidden states.

SAGE: Segment-Aware Gloss-Free Encoding for Token-Efficient Sign Language Translation 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品

相关技术查看全部 (4)