Building block components to control a data rate in the Apache Hadoop compute platform

Resource management is one of the most indispens- able components of cluster-level infrastructure layers. Users of such systems should be able to specify their job requirements as a configuration parameter (CPU, memory, disk I/O, network I/O) that are translated into an appropriate resource reservat...

Full description

Saved in:
Bibliographic Details
Published in:2015 18th International Conference on Intelligence in Next Generation Networks pp. 23 - 29
Main Authors: Do, Tien Van, Vu, Binh T., Do, Nam H., Farkas, Lorant, Rotter, Csaba, Tarjanyi, Tamas
Format: Conference Proceeding
Language:English
Published: IEEE 01-01-2015
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Resource management is one of the most indispens- able components of cluster-level infrastructure layers. Users of such systems should be able to specify their job requirements as a configuration parameter (CPU, memory, disk I/O, network I/O) that are translated into an appropriate resource reservation and resource allocation decision by the resource management function. YARN is an emerging resource management framework in the Hadoop ecosystem, which supports only memory and CPU reservation at present. In this paper, we propose a solution that takes into account the operation of the Hadoop Distributed File System to control the data rate of applications in the framework of a Hadoop compute platform. We utilize the property that a data pipe between a container and a DataNode consists of a disk I/O subpipe and a TCP/IP subpipe. We have implemented building block software components to control the data rate of data pipes between containers and DataNodes and provide a proof-of-concept with measurement results.
DOI:10.1109/ICIN.2015.7073802