PASQA: Pitch-Accent-Focused Speech Quality Assessment Model Trained on Synthetic Speech with Accent Errors 文章

ArXiv CS.CL2026-06-19NEWSen作者: Masaya Kawamura, Yuma Shirahata, Kentaro Mitsui, Reo Shimizu

详细信息

来源站点
ArXiv CS.CL
作者
Masaya Kawamura, Yuma Shirahata, Kentaro Mitsui, Reo Shimizu
文章类型
NEWS
语言
en
发布日期
2026-06-19

摘要

arXiv:2606.20137v1 Announce Type: cross Abstract: Existing mean opinion score (MOS) prediction models typically predict utterance-level naturalness MOS and can be insensitive to localized pitch-accent errors. We propose Pitch-Accent-focused Speech Quality Assessment (PASQA), which explicitly targets pitch-accent correctness. To train our model, we construct a controlled Japanese accent-error dataset by changing accent patterns using an accent-controllable text-to-speech system, and compute a pseudo accent-quality score from the accent-error rate. PASQA builds on self-supervised representations and employs mora-conditioned fusion, ranking loss, an auxiliary accent-error localization task, and speaker-invariant training. Experiments show that conventional models fail to preserve the ordering by accent-error severity, whereas PASQA achieves high ordering accuracy on both seen and unseen speakers. Further, PASQA shows stronger agreement with human accent-correctness judgments.

相关事件

暂无数据

相关公司

暂无数据

相关人物

暂无数据

相关技术

暂无数据