Convolutional Neural Networks for Distant Speech Recognition 论文

2014IEEE Signal Processing Letters引用 258

Speech and Audio ProcessingSpeech Recognition and SynthesisMusic and Audio Processing

Speech Recognition and Synthesis Speech and Audio Processing Music and Audio Processing

作者

摘要

We investigate convolutional neural networks (CNNs) for large vocabulary distant speech recognition, trained using speech recorded from a single distant microphone (SDM) and multiple distant microphones (MDM). In the MDM case we explore a beamformed signal input representation compared with the direct use of multiple acoustic channels as a parallel input to the CNN. We have explored different weight sharing approaches, and propose a channel-wise convolution with two-way pooling. Our experiments, using the AMI meeting corpus, found that CNNs improve the word error rate (WER) by 6.5% relative compared to conventional deep neural network (DNN) models and 15.7% over a discriminatively trained Gaussian mixture model (GMM) baseline. For cross-channel CNN training, the WER improves by 3.5% relative over the comparable DNN structure. Compared with the best beamformed GMM system, cross-channel convolution reduces the WER by 9.7% relative, and matches the accuracy of a beamformed DNN.

作者查看全部 (3)

Steve Renals

Arnab Ghoshal

Paweł Świętojański

Convolutional Neural Networks for Distant Speech Recognition 论文

摘要

作者查看全部 (3)

相关技术查看全部 (3)

相关事件

相关文章