Improved statistical machine translation using paraphrases 论文

2006引用 295

Natural Language Processing TechniquesTopic ModelingText Readability and Simplification

Natural Language Processing Techniques Topic Modeling Text Readability and Simplification

作者

摘要

Parallel corpora are crucial for training SMT systems. However, for many language pairs they are available only in very limited quantities. For these language pairs a huge portion of phrases encountered at run-time will be unknown. We show how techniques from paraphrasing can be used to deal with these otherwise unknown source language phrases. Our results show that augmenting a state-of-the-art SMT system with paraphrases leads to significantly improved coverage and translation quality. For a training corpus with 10,000 sentence pairs we increase the coverage of unique test set unigrams from 48% to 90%, with more than half of the newly covered items accurately translated, as opposed to none in current approaches.

作者查看全部 (3)

Miles Osborne

Philipp Koehn

Chris Callison-Burch

Improved statistical machine translation using paraphrases 论文

摘要

作者查看全部 (3)

相关技术查看全部 (2)

相关事件

相关文章