Dimensionality reduction by random mapping: fast similarity computation for clustering 论文

2002引用 389

Neural Networks and ApplicationsFace and Expression RecognitionMachine Learning in Bioinformatics

人工智能 Neural Networks and Applications Face and Expression Recognition Machine Learning in Bioinformatics

作者

摘要

When the data vectors are high-dimensional it is computationally infeasible to use data analysis or pattern recognition algorithms which repeatedly compute similarities or distances in the original data space. It is therefore necessary to reduce the dimensionality before, for example, clustering the data. If the dimensionality is very high, like in the WEBSOM method which organizes textual document collections on a self-organizing map, then even the commonly used dimensionality reduction methods like the principal component analysis may be too costly. It is demonstrated that the document classification accuracy obtained after the dimensionality has been reduced using a random mapping method will be almost as good as the original accuracy if the final dimensionality is sufficiently large (about 100 out of 6000). In fact, it can be shown that the inner product (similarity) between the mapped vectors follows closely the inner product of the original vectors.

作者查看全部 (1)

Samuel Kaski

Dimensionality reduction by random mapping: fast similarity computation for clustering 论文

摘要

作者查看全部 (1)

相关技术查看全部 (3)

相关事件

相关文章