Apriori-based frequent itemset mining algorithms on MapReduce 论文

2012引用 232

Data Mining Algorithms and ApplicationsRough Sets and Fuzzy LogicData Management and Algorithms

Data Management and Algorithms Data Mining Algorithms and Applications Rough Sets and Fuzzy Logic

作者

摘要

Many parallelization techniques have been proposed to enhance the performance of the Apriori-like frequent itemset mining algorithms. Characterized by both map and reduce functions, MapReduce has emerged and excels in the mining of datasets of terabyte scale or larger in either homogeneous or heterogeneous clusters. Minimizing the scheduling overhead of each map-reduce phase and maximizing the utilization of nodes in each phase are keys to successful MapReduce implementations. In this paper, we propose three algorithms, named SPC, FPC, and DPC, to investigate effective implementations of the Apriori algorithm in the MapReduce framework. DPC features in dynamically combining candidates of various lengths and outperforms both the straight-forward algorithm SPC and the fixed passes combined counting algorithm FPC. Extensive experimental results also show that all the three algorithms scale up linearly with respect to dataset sizes and cluster sizes.

作者查看全部 (3)

Sue-Chen Hsueh

Pei‐Yu Lee

Ming-Yen Lin

Apriori-based frequent itemset mining algorithms on MapReduce 论文

摘要

作者查看全部 (3)

相关技术查看全部 (2)

相关事件

相关文章