Cyclic Storage for Fault-Tolerant Distributed Executions

Given a set V of active components in charge of a distributed execution, a storage scheme is a sequence B 0 , B 1 ,..., B b-1 of subsets of V, where successive global states are recorded. The subsets, also called blocks, have the same size and are scheduled according to some fixed and cyclic calenda...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on parallel and distributed systems Vol. 17; no. 9; pp. 1028 - 1036
Main Authors: Marcelm-Jimenez, R., Rajsbaum, S., Stevens, B.
Format: Journal Article
Language:English
Published: New York IEEE 01-09-2006
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Given a set V of active components in charge of a distributed execution, a storage scheme is a sequence B 0 , B 1 ,..., B b-1 of subsets of V, where successive global states are recorded. The subsets, also called blocks, have the same size and are scheduled according to some fixed and cyclic calendar of b steps. During the ith step, block B i is selected. Each component takes a copy of its local state and sends it to one of the components in B i , in such a way that each component stores (approximately) the same number of local states. Afterward, if a component of B i crashes, all of its stored data is lost and the computation cannot continue. If there exists a block with no failed components in it, then a recent global state can be retrieved and the computation does not need to start over from the very beginning. The goal is to design storage schemes that tolerate as many crashes as possible, while trying to have each component participating in as few blocks as possible and, at the same time, working with large blocks (so that a component in a block stores a small number of local states). In this paper, several such schemes are described and compared in terms of these measures
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ISSN:1045-9219
1558-2183
DOI:10.1109/TPDS.2006.120