A Conditional Random Field Word Segmenter for Sighan Bakeoff 2005 论文
2005引用 337
Natural Language Processing TechniquesTopic ModelingHandwritten Text Recognition Techniques
摘要
We present a Chinese word segmentation system submitted to the closed track of Sighan bakeoff 2005. Our segmenter was built using a conditional random field sequence model that provides a framework to use a large number of linguistic features such as character identity, morphological and character reduplication features. Because our morphological features were extracted from the training corpora automatically, our system was not biased toward any particular variety of Mandarin. Thus, our system does not