A simple algorithm for nearest neighbor search in high dimensions 论文

1997IEEE Transactions on Pattern Analysis and Machine Intelligence引用 313
Data Management and AlgorithmsAdvanced Image and Video Retrieval TechniquesComputational Geometry and Mesh Generation

摘要

The problem of finding the closest point in high-dimensional spaces is common in pattern recognition. Unfortunately, the complexity of most existing search algorithms, such as k-d tree and R-tree, grows exponentially with dimension, making them impractical for dimensionality above 15. In nearly all applications, the closest point is of interest only if it lies within a user-specified distance /spl epsiv/. We present a simple and practical algorithm to efficiently search for the nearest neighbor within Euclidean distance /spl epsiv/. The use of projection search combined with a novel data structure dramatically improves performance in high dimensions. A complexity analysis is presented which helps to automatically determine /spl epsiv/ in structured problems. A comprehensive set of benchmarks clearly shows the superiority of the proposed algorithm for a variety of structured and unstructured search problems. Object recognition is demonstrated as an example application. The simplicity of the algorithm makes it possible to construct an inexpensive hardware search engine which can be 100 times faster than its software equivalent. A C++ implementation of our algorithm is available.