Improving MapReduce performance through data placement in heterogeneous Hadoop clusters 论文

2010引用 381

Cloud Computing and Resource ManagementIoT and Edge/Fog ComputingCaching and Content Delivery

Cloud Computing and Resource Management Caching and Content Delivery IoT and Edge/Fog Computing

作者

摘要

MapReduce has become an important distributed processing model for large-scale data-intensive applications like data mining and web indexing. Hadoop-an open-source implementation of MapReduce is widely used for short jobs requiring low response time. The current Hadoop implementation assumes that computing nodes in a cluster are homogeneous in nature. Data locality has not been taken into account for launching speculative map tasks, because it is assumed that most maps are data-local. Unfortunately, both the homogeneity and data locality assumptions are not satisfied in virtualized data centers. We show that ignoring the data-locality issue in heterogeneous environments can noticeably reduce the MapReduce performance. In this paper, we address the problem of how to place data across nodes in a way that each node has a balanced data processing load. Given a dataintensive application running on a Hadoop MapReduce cluster, our data placement scheme adaptively balances the amount of data stored in each node to achieve improved data-processing performance. Experimental results on two real data-intensive applications show that our data placement strategy can always improve the MapReduce performance by rebalancing data across nodes before performing a data-intensive application in a heterogeneous Hadoop cluster.

作者查看全部 (8)

Xiao Qin

Adam Manzanares

James Majors

Yun Tian

Improving MapReduce performance through data placement in heterogeneous Hadoop clusters 论文

摘要

作者查看全部 (8)

相关技术

相关事件

相关文章