Unsupervised models for morpheme segmentation and morphology learning 论文

2007ACM Transactions on Speech and Language Processing引用 335

Natural Language Processing TechniquesTopic ModelingSpeech and dialogue systems

Natural Language Processing Techniques Topic Modeling Speech and dialogue systems

作者

摘要

We present a model family called Morfessor for the unsupervised induction of a simple morphology from raw text data. The model is formulated in a probabilistic maximum a posteriori framework. Morfessor can handle highly inflecting and compounding languages where words can consist of lengthy sequences of morphemes. A lexicon of word segments, called morphs , is induced from the data. The lexicon stores information about both the usage and form of the morphs. Several instances of the model are evaluated quantitatively in a morpheme segmentation task on different sized sets of Finnish as well as English data. Morfessor is shown to perform very well compared to a widely known benchmark algorithm, in particular on Finnish data.

作者查看全部 (2)

Krista Lagus

Mathias Creutz

Unsupervised models for morpheme segmentation and morphology learning 论文

摘要

作者查看全部 (2)

相关技术查看全部 (2)

相关事件

相关文章