Leap-based Content Defined Chunking - Theory and Implementation

Content Defined Chunking (CDC) is an important component in data deduplication, which affects both the deduplication ratio as well as deduplication performance. The sliding-window-based CDC algorithm and its variants have been the most popular CDC algorithms for the last 15 years. However, their per...

Full description

Saved in:
Bibliographic Details
Published in:2015 31st Symposium on Mass Storage Systems and Technologies (MSST) pp. 1 - 12
Main Authors: Chuanshuai Yu, Chengwei Zhang, Yiping Mao, Fulu Li
Format: Conference Proceeding
Language:English
Published: IEEE 01-05-2015
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Content Defined Chunking (CDC) is an important component in data deduplication, which affects both the deduplication ratio as well as deduplication performance. The sliding-window-based CDC algorithm and its variants have been the most popular CDC algorithms for the last 15 years. However, their performance is limited in certain application scenarios since they have to slide byte by byte. The authors present a leap-based CDC algorithm which provides significant improvement in deduplication performance without compromising the deduplication ratio. Compared to the sliding-window-based CDC algorithm, the new algorithm enables up to two-fold improvement in performance.
ISSN:2160-195X
2160-1968
DOI:10.1109/MSST.2015.7208290