Phonetic Error Analysis of Raw Waveform Acoustic Models 文章

ArXiv CS.CL2026-06-08NEWSen作者: Erfan Loweimi, Zhengjun Yue, Andrea Carmantini, Zoran Cvetkovic, Steve Renals, Peter Bell

查看原文 →

关系图谱

摘要

arXiv:2606.07030v1 Announce Type: cross Abstract: We analyse error patterns of raw waveform acoustic models on TIMIT phone recognition beyond the overall phone error rate (PER). PER is decomposed across three broad phonetic class (BPC) categorisations, and confusion matrices are constructed from substitution errors. Our models combine parametric (SincNet, Sinc2Net) or non-parametric CNNs with Bidirectional LSTMs, achieving 13.9%/15.3% PER on Dev/Test, the best reported results for raw waveform models on TIMIT. Transfer learning from WSJ reduces PER to 11.3%/12.3%, surpassing the Filterbank baseline. Per-BPC analysis reveals that BLSTM layers benefit transition-dependent classes most, while WSJ transfer learning improves consonants roughly three times more than vowels. Confusion patterns are consistent across raw waveform and Filterbank systems, indicating that the dominant confusions reflect inherent phonetic similarities.

Phonetic Error Analysis of Raw Waveform Acoustic Models 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品查看全部 (4)

相关技术查看全部 (6)