BDEv 3.0: Energy efficiency and microarchitectural characterization of Big Data processing frameworks

As the size of Big Data workloads keeps increasing, the evaluation of distributed frameworks becomes a crucial task in order to identify potential performance bottlenecks that may delay the processing of large datasets. While most of the existing works generally focus only on execution time and reso...

Full description

Saved in:
Bibliographic Details
Published in:Future generation computer systems Vol. 86; pp. 565 - 581
Main Authors: Veiga, Jorge, Enes, Jonatan, Expósito, Roberto R., Touriño, Juan
Format: Journal Article
Language:English
Published: Elsevier B.V 01-09-2018
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:As the size of Big Data workloads keeps increasing, the evaluation of distributed frameworks becomes a crucial task in order to identify potential performance bottlenecks that may delay the processing of large datasets. While most of the existing works generally focus only on execution time and resource utilization, analyzing other important metrics is key to fully understanding the behavior of these frameworks. For example, microarchitecture-level events can bring meaningful insights to characterize the interaction between frameworks and hardware. Moreover, energy consumption is also gaining increasing attention as systems scale to thousands of cores. This work discusses the current state of the art in evaluating distributed processing frameworks, while extending our Big Data Evaluator tool (BDEv) to extract energy efficiency and microarchitecture-level metrics from the execution of representative Big Data workloads. An experimental evaluation using BDEv demonstrates its usefulness to bring meaningful information from popular frameworks such as Hadoop, Spark and Flink. •A comprehensive state-of-the-art survey about the benchmarking of Big Data systems.•Proposal of BDEv 3.0, a holistic evaluation tool for Big Data processing frameworks.•BDEv includes resource usage, energy efficiency and microarchitectural metrics.•A practical use case of BDEv comparing current versions of Hadoop, Spark and Flink.
ISSN:0167-739X
1872-7115
DOI:10.1016/j.future.2018.04.030