Performance of Hadoop Application on Hybrid Cloud

Hadoop is an open-source software framework for distributed computing that is widely used to develop large-scale data processing applications, such as big data applications. Hadoop application programs are normally run on in-house or cloud computing platforms. Recently, a hybrid cloud composed of in...

Full description

Saved in:
Bibliographic Details
Published in:2015 International Conference on Cloud Computing Research and Innovation (ICCCRI) pp. 130 - 138
Main Authors: Ohnaga, Hayata, Aida, Kento, Abdul-Rahman, Omar
Format: Conference Proceeding
Language:English
Published: IEEE 01-10-2015
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Hadoop is an open-source software framework for distributed computing that is widely used to develop large-scale data processing applications, such as big data applications. Hadoop application programs are normally run on in-house or cloud computing platforms. Recently, a hybrid cloud composed of in-house and remote cloud computing platforms has been found to be capable of sustaining a certain level of application performance. In this paper, we discuss the performance of a Hadoop application program running on such hybrid clouds. We will begin by presenting the performance model used to estimate the execution time of a Hadoop application program running on a hybrid cloud. Then, we will show the results of experiments conducted on hybrid cloud test beds. These experimental results revealed that the performance levels of the Hadoop application programs running on the hybrid cloud were application type dependent, and that performance improvements could be expected by using a remote cloud computing platform in conjunction with in-house computing platforms for certain types of applications. Furthermore, the results showed that our performance model captured the performance trend of the application programs on the hybrid cloud. However, room for improvement still exists in the performance model, particularly for the shuffle phase.
DOI:10.1109/ICCCRI.2015.25