Modeling and optimizing MapReduce programs

SUMMARYMapReduce frameworks allow programmers to write distributed, data‐parallel programs that operate on multisets. These frameworks offer considerable flexibility to support various kinds of programs and data. To understand the essence of the programming model better and to provide a rigorous fou...

Full description

Saved in:
Bibliographic Details
Published in:Concurrency and computation Vol. 27; no. 7; pp. 1734 - 1766
Main Authors: Dörre, Jens, Apel, Sven, Lengauer, Christian
Format: Journal Article
Language:English
Published: 01-05-2015
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:SUMMARYMapReduce frameworks allow programmers to write distributed, data‐parallel programs that operate on multisets. These frameworks offer considerable flexibility to support various kinds of programs and data. To understand the essence of the programming model better and to provide a rigorous foundation for optimizations, we present an , functional model of MapReduce along with a number of customization options. We demonstrate that the MapReduce programming model can also represent programs that operate on lists, which differ from multisets in that the order of elements matters. Along with the functional model, we offer a cost model that allows programmers to estimate and compare the performance of MapReduce programs. Based on the cost model, we introduce two transformation rules aiming at performance optimization of MapReduce programs, which also demonstrates the usefulness of our model. In an exploratory study, we assess the impact of applying these rules to two applications. The functional model and the cost model provide insights at a proper level of ion into why the optimization works. Copyright © 2014 John Wiley & Sons, Ltd.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1532-0626
1532-0634
DOI:10.1002/cpe.3333