Analysis of CNN-based speech recognition system using raw speech as input 论文

2015引用 279

Speech Recognition and SynthesisSpeech and Audio ProcessingMusic and Audio Processing

Speech Recognition and Synthesis Speech and Audio Processing Music and Audio Processing

作者

摘要

Automatic speech recognition systems typically model the rela-tionship between the acoustic speech signal and the phones in two separate steps: feature extraction and classifier training. In our recent works, we have shown that, in the framework of con-volutional neural networks (CNN), the relationship between the raw speech signal and the phones can be directly modeled and ASR systems competitive to standard approach can be built. In this paper, we first analyze and show that, between the first two convolutional layers, the CNN learns (in parts) and models the phone-specific spectral envelope information of 2-4 ms speech. Given that we show that the CNN-based approach yields ASR trends similar to standard short-term spectral based ASR sys-tem under mismatched (noisy) conditions, with the CNN-based approach being more robust. Index Terms: automatic speech recognition, convolutional neural networks, raw signal, robust speech recognition.

作者查看全部 (3)

Ronan Collobert

Mathew Magimai.-Doss

Dimitri Palaz

Analysis of CNN-based speech recognition system using raw speech as input 论文

详细信息

摘要

作者查看全部 (3)

相关技术查看全部 (3)

相关事件

相关文章