Indexing the Distance: An Efficient Method to KNN Processing 论文

2001引用 285
Data Management and AlgorithmsAdvanced Image and Video Retrieval TechniquesVideo Analysis and Summarization

摘要

In this paper, we present an efficient method, called iDistance, for K-nearest neighbor (KNN) search in a high-dimensional space. iDistance partitions the data and selects a reference point for each partition. The data in each cluster are transformed into a single d-mensional space based on their similarity with respect to a reference point. This allows the points to be indexed using a B-tree structure and KNN search be performed using one-dimensional range search. The choice of par-tition and reference point provides the iDistance technique with degrees of freedom most other techniques do not have. We describe how appropriate choices here can effectively adapt the index structure to the data distribution. We conducted extensive experiments to evaluate the iDistance technique, and report results demonstrating its effectiveness.