On the Fallacy of Global Token Perplexity in Spoken Language Model Evaluation 文章

ArXiv CS.CL2026-05-28NEWSen作者: Chan-Jan Hsu, Liang-Hsuan Tseng, Yi-Cheng Lin, Yen-Chun Kuo, Ju-Chieh Chou, Kai-Wei Chang, Hung-yi Lee, Carlos Busso

查看原文 →

关系图谱

摘要

arXiv:2601.06329v2 Announce Type: replace Abstract: Generative spoken language models pretrained on large-scale raw audio can continue a speech prompt with appropriate content while preserving attributes like speaker and emotion, serving as foundation models for spoken dialogue. In prior literature, these models are often evaluated using ``global token perplexity'', which directly applies the text perplexity formulation to speech tokens. However, this practice overlooks fundamental differences between speech and text modalities, possibly leading to an underestimation of the speech characteristics. In this work, we propose a variety of likelihood- and generative-based evaluation methods that serve in place of naive global token perplexity. We demonstrate that the proposed evaluations more faithfully reflect perceived generation quality, as evidenced by stronger correlations with human-rated mean opinion scores (MOS).

On the Fallacy of Global Token Perplexity in Spoken Language Model Evaluation 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术