Random indexing of text samples for latent semantic analysis 论文

2000eScholarship (California Digital Library)引用 387

Topic ModelingNatural Language Processing TechniquesAdvanced Text Analysis Techniques

Natural Language Processing Techniques Topic Modeling Advanced Text Analysis Techniques

作者

摘要

VD, the result is not nearly as good: only 36% correct. The authors conclude that the reorganization of information by SVD somehow corresponds to human psychology. We have studied high-dimensional random distributed representations, as models of brainlike representation of information (Kanerva, 1994# Kanerva &amp; Sjodin, 1999). In this poster we report on the use of such a representation to reduce the dimensionality of the original words-by-contexts matrix. The method can be explained by looking at the 60,000 \\Theta 30,000 matrix of frequencies above. Assume that each text sample is represented by a 30,000-bit vector with a single 1 marking the place of the sample in a list of all samples, and call it the sample&apos;s index vector (i.e., the nth bit of the index vector for the nth text sample is 1---the representation is unitary or local) . Then the words-by-contexts matrix of frequencies can be gotten by the following procedure: every time that the word w occurs in the nth text sample, the

作者查看全部 (3)

Anders Holst

Jan Kristoferson

Pentti Kanerva

Random indexing of text samples for latent semantic analysis 论文

摘要

作者查看全部 (3)

相关技术查看全部 (2)

相关事件

相关文章