Modeling and optimizing MapReduce programs
SUMMARYMapReduce frameworks allow programmers to write distributed, data‐parallel programs that operate on multisets. These frameworks offer considerable flexibility to support various kinds of programs and data. To understand the essence of the programming model better and to provide a rigorous fou...
Saved in:
Published in: | Concurrency and computation Vol. 27; no. 7; pp. 1734 - 1766 |
---|---|
Main Authors: | , , |
Format: | Journal Article |
Language: | English |
Published: |
01-05-2015
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | SUMMARYMapReduce frameworks allow programmers to write distributed, data‐parallel programs that operate on multisets. These frameworks offer considerable flexibility to support various kinds of programs and data. To understand the essence of the programming model better and to provide a rigorous foundation for optimizations, we present an , functional model of MapReduce along with a number of customization options. We demonstrate that the MapReduce programming model can also represent programs that operate on lists, which differ from multisets in that the order of elements matters. Along with the functional model, we offer a cost model that allows programmers to estimate and compare the performance of MapReduce programs. Based on the cost model, we introduce two transformation rules aiming at performance optimization of MapReduce programs, which also demonstrates the usefulness of our model. In an exploratory study, we assess the impact of applying these rules to two applications. The functional model and the cost model provide insights at a proper level of ion into why the optimization works. Copyright © 2014 John Wiley & Sons, Ltd. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 1532-0626 1532-0634 |
DOI: | 10.1002/cpe.3333 |