A stochastic finite-state word-segmentation algorithm for Chinese 论文

1994引用 290
Natural Language Processing TechniquesAlgorithms and Data CompressionTopic Modeling

摘要

We present a stochastic finite-state model for segmenting Chinese text into dictionary entries and productively derived words, and providing pronunciations for these words; the method incorporates a class-based model in its treatment of personal names. We also evaluate the system's performance, taking into account the fact that people often do not agree on a single segmentation.