MapReduce for Data Intensive Scientific Analyses 论文

2008引用 378
Cloud Computing and Resource ManagementAdvanced Data Storage TechnologiesScientific Computing and Data Management

摘要

Most scientific data analyses comprise analyzing voluminous data collected from various instruments. Efficient parallel/concurrent algorithms and frameworks are the key to meeting the scalability and performance requirements entailed in such scientific data analyses. The recently introduced MapReduce technique has gained a lot of attention from the scientific community for its applicability in large parallel data analyses. Although there are many evaluations of the MapReduce technique using large textual data collections, there have been only a few evaluations for scientific data analyses. The goals of this paper are twofold. First, we present our experience in applying the MapReduce technique for two scientific data analyses: (i) high energy physics data analyses; (ii) K-means clustering. Second, we present CGL-MapReduce, a streaming-based MapReduce implementation and compare its performance with Hadoop.

相关技术

暂无数据

相关事件

暂无数据

相关文章

暂无数据