IRSTLM: an open source toolkit for handling large scale language models 论文

2008引用 327

Natural Language Processing TechniquesTopic ModelingSpeech Recognition and Synthesis

Natural Language Processing Techniques Topic Modeling Speech Recognition and Synthesis

作者

摘要

Research in speech recognition and machine translation is boosting the use of large scale n-gram language models. We present an open source toolkit that permits to efficiently handle language models with billions of n-grams on conventional machines. The IRSTLM toolkit supports distribution of ngram collection and smoothing over a computer cluster, language model compression through probability quantization, lazy-loading of huge language models from disk. IRSTLM has been so far successfully deployed with the Moses toolkit for statistical machine translation and with the FBK-irst speech recognition system. Efficiency of the tool is reported on a speech transcription task of Italian political speeches using a language model of 1.1 billion four-grams.

作者查看全部 (3)

Mauro Cettolo

Nicola Bertoldi

Marcello Federico

IRSTLM: an open source toolkit for handling large scale language models 论文

摘要

作者查看全部 (3)

相关技术查看全部 (3)

相关事件

相关文章