Copied Monolingual Data Improves Low-Resource Neural Machine Translation 论文

2017引用 221
Natural Language Processing TechniquesTopic ModelingMultimodal Machine Learning Applications

摘要

We train a neural machine translation (NMT) system to both translate sourcelanguage text and copy target-language text, thereby exploiting monolingual corpora in the target language. Specifically, we create a bitext from the monolingual text in the target language so that each source sentence is identical to the target sentence. This copied data is then mixed with the parallel corpus and the NMT system is trained like normal, with no metadata to distinguish the two input languages.