Approximate medians and other quantiles in one pass and with limited memory 论文

1998引用 266
Advanced Database Systems and QueriesData Management and AlgorithmsAlgorithms and Data Compression

摘要

We present new algorithms for computing approximate quantiles of large datasets in a single pass. The approximation guarantees are explicit, and apply without regard to the value distribution or the arrival distributions of the dataset. The main memory requirements are smaller than those reported earlier by an order of magnitude. We also discuss methods that couple the approximation algorithms with random sampling to further reduce memory requirements. With sampling, the approximation guarantees are explicit but probabilistic, i.e., they apply with respect to a (user controlled) confidence parameter. We present the algorithms, their theoretical analysis and simulation results. 1 Introduction This article studies the problem of computing order statistics of large sequences of online or disk-resident data using as little main memory as possible. We focus on computing quantiles, which are elements at specific positions in the sorted order of the input. The OE-quantile, for OE 2 [0; ...