Practical Entropy-Compressed Rank/Select Dictionary 论文

2007Society for Industrial and Applied Mathematics eBooks引用 277
Algorithms and Data CompressionError Correcting Code TechniquesMachine Learning in Bioinformatics

摘要

Rank/Select dictionaries are data structures for an ordered set S ⊂ {0,1,…, n − 1} to compute runk(x, S) (the number of elements in S that are no greater than x), and select(i, S) (the i-th smallest element in S), which are the fundamental components of succinct data structures of strings, trees, graphs, etc‥ In these data structures, however, only asymptotic behavior has been considered and their performance for real data is not satisfactory. In this paper, we propose four novel Rank/Select dictionaries: esp, recrank, vcode and sdarray, each of which is small if the number of elements in S is small, and indeed close to nH0(S) (H0(S) < 1 is the zero-th order empirical entropy of S) in practice. Furthermore, their query times are superior to those of existing structures. Experimental results reveal the characteristics of our data structures and also show that these data structures are superior to existing implementations, both in terms of size and query time.