Distributional semantics in linguistic and cognitive research 论文

2008CINECA IRIS Institutial research information system (University of Pisa)引用 277
Natural Language Processing TechniquesLanguage and cultural evolutionTopic Modeling

摘要

The hypothesis that word co-occurrence statistics extracted from text corpora can provide a basis for semantic representations has been gaining growing attention both in computational linguistics and in cognitive science. The terms distributional, context-theoretic, corpus- based or statistical can all be used (almost interchangeably) to qualify a rich family of approaches to semantics that share a “usage-based” perspective on meaning, and assume that the statistical distribution of words in context plays a key role in characterizing their semantic behavior. Besides this common core, many differences exist depend- ing on the specific mathematical and computational techniques, the type of semantic properties associated with text distributions, the definition of the linguistic context used to determine the combinato- rial spaces of lexical items, etc. Yet, at a closer look, we may discover that the commonalities are more than we could expect prima facie, and that a general model of meaning can indeed be discerned behind the differences, a model that formulates specific hypotheses on the format of semantic representations, and on the way they are built and processed by the human mind.