Google Users as Sequences: A Robust Hierarchical Cluster Analysis Study

In this era of cloud computing, users encounter the challenging task of effectively composing and running their applications on the cloud. By understanding user behavior in constructing applications and interacting with typical cloud infrastructures, cloud managers can develop better systems that im...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on cloud computing Vol. 8; no. 1; pp. 167 - 179
Main Authors: Abdul-Rahman, Omar Arif, Aida, Kento
Format: Journal Article
Language:English
Published: Piscataway IEEE Computer Society 01-01-2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In this era of cloud computing, users encounter the challenging task of effectively composing and running their applications on the cloud. By understanding user behavior in constructing applications and interacting with typical cloud infrastructures, cloud managers can develop better systems that improve the users' experience. In this paper, we analyze a large dataset of a Google cluster to characterize the users into distinct groups of similar usage behavior. We used a wide range of measured metrics to model user behavior in composing applications from the perspective of actions around application architecting, capacity planning, and workload type planning and to model user interaction behavior around the session view. The trajectories of users' actions are represented as sequences using categorical and proportional encoding schemes. We used techniques from the sequence analysis paradigm to quantify dissimilarity among users. We employed a robust cluster analysis procedure based on the agglomerative hierarchical methods to optimally classify users into 12 classes. We used a variety of formal indices and visual aids to confirm the quality and stability of the outcomes. By visual inspection, we regrouped the obtained clusters into 5 main groups that reveal interesting insights about the characteristics which underline different groups' utilization behavior.
ISSN:2168-7161
2372-0018
DOI:10.1109/TCC.2017.2766227