cdec: A Decoder‚ Alignment‚ and Learning framework for finite−state and context−free translation models 论文
摘要
We present cdec, an open source frame-work for decoding, aligning with, and training a number of statistical machine translation models, including word-based models, phrase-based models, and models based on synchronous context-free gram-mars. Using a single unified internal representation for translation forests, the decoder strictly separates model-specific translation logic from general rescoring, pruning, and inference algorithms. From this unified representation, the decoder can extract not only the 1- or k-best transla-tions, but also alignments to a reference, or the quantities necessary to drive dis-criminative training using gradient-based or gradient-free optimization techniques. Its efficient C++ implementation means that memory use and runtime performance are significantly better than comparable decoders. 1