Interpolating between types and tokens by estimating power-law generators 论文

2005引用 215

Natural Language Processing TechniquesTopic ModelingLanguage and cultural evolution

Natural Language Processing Techniques Topic Modeling Language and cultural evolution

作者

摘要

Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-law distribution in the frequencies of word tokens. We present a framework for developing statistical models that generically produce power-laws, augmenting standard generative models with an adaptor that produces the appropriate pattern of token frequencies. We show that taking a particular stochastic process -- the Pitman-Yor process -- as an adaptor justifies the appearance of type frequencies in formal analyses of natural language, and improves the performance of a model for unsupervised learning of morphology.

作者查看全部 (3)

Thomas L. Griffiths

Mark Johnson

Sharon Goldwater

Interpolating between types and tokens by estimating power-law generators 论文

摘要

作者查看全部 (3)

相关技术查看全部 (2)

相关事件

相关文章