A Low-Cost Multi-failure Resilient Replication Scheme for High Data Availability in Cloud Storage

Replication is a common approach to enhance data availability in cloud storage systems. Previously proposed replication schemes cannot effectively handle both correlated and non-correlated machine failures while increasing the data availability with the limited resource. The schemes for correlated m...

Full description

Saved in:
Bibliographic Details
Published in:2016 IEEE 23rd International Conference on High Performance Computing (HiPC) pp. 242 - 251
Main Authors: Jinwei Liu, Haiying Shen
Format: Conference Proceeding
Language:English
Published: IEEE 01-12-2016
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Replication is a common approach to enhance data availability in cloud storage systems. Previously proposed replication schemes cannot effectively handle both correlated and non-correlated machine failures while increasing the data availability with the limited resource. The schemes for correlated machine failures must create a constant number of replicas for each data object, which neglects diverse data popularities and cannot utilize the resource to maximize the expected data availability. Also, the previous schemes neglect the consistency maintenance cost and the storage cost caused by replication. It is critical for cloud providers to maximize data availability (hence minimize SLA violations) while minimizing cost caused by replication in order to maximize the revenue. In this paper, we build a nonlinear integer programming model to maximize data availability in both types of failures and minimize the cost caused by replication. Based on the model's solution for the replication degree of each data object, we propose a low-cost multi-failure resilient replication scheme (MRR). MRR can effectively handle both correlated and non-correlated machine failures, considers data popularities to enhance data availability, and also tries to minimize consistency maintenance cost and storage cost. Extensive numerical results from trace parameters and experiments from real-world Amazon S3 show that MRR achieves high data availability, low data loss probability and low consistency maintenance cost and storage cost compared to previous replication schemes.
DOI:10.1109/HiPC.2016.036