Random indexing of text samples for latent semantic analysis 论文
摘要
VD, the result is not nearly as good: only 36% correct. The authors conclude that the reorganization of information by SVD somehow corresponds to human psychology. We have studied high-dimensional random distributed representations, as models of brainlike representation of information (Kanerva, 1994# Kanerva & Sjodin, 1999). In this poster we report on the use of such a representation to reduce the dimensionality of the original words-by-contexts matrix. The method can be explained by looking at the 60,000 \\Theta 30,000 matrix of frequencies above. Assume that each text sample is represented by a 30,000-bit vector with a single 1 marking the place of the sample in a list of all samples, and call it the sample's index vector (i.e., the nth bit of the index vector for the nth text sample is 1---the representation is unitary or local) . Then the words-by-contexts matrix of frequencies can be gotten by the following procedure: every time that the word w occurs in the nth text sample, the