Conditional Collapse in Sign Language Production: A Diagnostic and a Scaling Argument 文章

ArXiv CS.CV2026-06-02NEWSen作者: Rui Hong, Jana Ko\v{s}eck\'a

摘要

arXiv:2606.01643v1 Announce Type: new Abstract: Sign Language Production (SLP) is the task of generating avatar sign language motion from natural language text. The quality of the generated motion is typically evaluated by a motion-space Fr\'echet distance (FID) and back-translation (BT) BLEU score on benchmarks such as How2Sign. Both metrics can improve substantially while the underlying generator fails to faithfully represent the sign language gestures. In this work we propose to evaluate the generated motion at three independent levels: ({\tau}1) initial-pose conditioning, ({\tau}2) output diversity, and ({\tau}3) target faithfulness. We compute these as pairwise-distance ratios using latent representations of a frozen motion autoencoder (MoAE). We evaluate 14 SLP model checkpoints on the How2Sign dataset, including a re-implemented Neural Sign Actors (NSA), and show that {\tau}3 faithfulness is never…

摘要可能不完整,可查看原文