An Effective Way To Reduce Network Transmission In Backup System

Content-defined chunking (CDC) algorithms play an important role in data deduplication, data synchronization and cloud storage. The existing CDC algorithms have the problems of unstable chunk size variance and low chunking throughput in processing low entropy strings. To solve these problems, this p...

Full description

Saved in:
Bibliographic Details
Published in:2022 23rd IEEE International Conference on Mobile Data Management (MDM) pp. 125 - 127
Main Authors: Chao, Yun, Su, JinDian
Format: Conference Proceeding
Language:English
Published: IEEE 01-06-2022
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Content-defined chunking (CDC) algorithms play an important role in data deduplication, data synchronization and cloud storage. The existing CDC algorithms have the problems of unstable chunk size variance and low chunking throughput in processing low entropy strings. To solve these problems, this paper proposes Double Extreme (DE) and Rapid Double Extreme (RDE) CDC algorithm. Both DE and RDE are hash-free chunking algorithms. DE uses the byte values in the sliding window to determine the cut point. The strategy of using both maximum and minimum allows DE to better handle low entropy strings and achieve a small chunk size variance. RDE, based on DE, uses a multi-step strategy to achieve higher chunking throughput. We compared DE and RDE with the existing CDC algorithms. The experimental results show that DE and RDE significantly reduce the chunk size variance of the CDC algorithms and improves the chunking throughput performance compare to other CDC algorithms.
ISSN:2375-0324
DOI:10.1109/MDM55031.2022.00038