Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition 论文

2017引用 298

Speech Recognition and SynthesisMusic and Audio ProcessingSpeech and Audio Processing

Speech Recognition and Synthesis Speech and Audio Processing Music and Audio Processing

作者

摘要

We present results that show it is possible to build a competitive, greatly simplified, large vocabulary continuous speech recognition system with whole words as acoustic units.We model the output vocabulary of about 100,000 words directly using deep bi-directional LSTM RNNs with CTC loss.The model is trained on 125,000 hours of semi-supervised acoustic training data, which enables us to alleviate the data sparsity problem for word models.We show that the CTC word models work very well as an end-to-end all-neural speech recognition model without the use of traditional context-dependent sub-word phone units that require a pronunciation lexicon, and without any language model removing the need to decode.We demonstrate that the CTC word models perform better than a strong, more complex, state-of-the-art baseline with sub-word units.

作者查看全部 (3)

Haşim Sak

Hank Liao

Hagen Soltau

Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition 论文

摘要

作者查看全部 (3)

相关技术查看全部 (3)

相关事件

相关文章