Sur la variabilité de la fréquence des formes dans un corpus 论文
1980Mots引用 216
Natural Language Processing Techniques
摘要
FORM FREQUENCY VARIABILITY IN A CORPUS P. L. is studying the distribution of the frequency of a word in a corpus divided into several fragments. Contrary to current work in this field, he proposes to use the formulae of the hypergeometric distribution, choosing the whole corpus as the norm of the fragments. These choices lead to the calculation of a probabilistic index valid for the whole range of frequencies. The calculation of this index for every form in the vocabulary enables us to define two complementary subsets of forms : that of specific forms and that of basic forms, and to attribute to each fragment its own lexical specifications.