Grounding or Guessing? Visual Signals for Detecting Hallucinations in Sign Language Translation 文章

ArXiv CS.CL2026-06-02NEWSen作者: Yasser Hamidullah, Koel Dutta Chowdhury, Yusser Al Ghussin, Shakib Yazdani, Cennet Oguz, Josef van Genabith, Cristina Espa\~na-Bonet

查看原文 →

关系图谱

摘要

arXiv:2510.18439v3 Announce Type: replace Abstract: Hallucination, where models generate fluent text unsupported by visual evidence, remains a major flaw in vision-language models and is particularly critical in sign language translation (SLT). In SLT, meaning depends on precise grounding in video, and gloss-free models are especially vulnerable because they map continuous signer movements directly into natural language without intermediate gloss supervision that serves as alignment. We argue that hallucinations arise when models rely on language priors rather than visual input. To capture this, we propose a token-level reliability measure that quantifies how much the decoder uses visual information. Our method combines feature-based sensitivity, which measures internal changes when video is masked, with counterfactual signals, which capture probability differences between clean and altered video inputs.

Grounding or Guessing? Visual Signals for Detecting Hallucinations in Sign Language Translation 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (2)