A variational calculus approach to optimal checkpoint placement

Checkpointing is an effective fault-tolerant technique for improving system availability and reliability. However, a blind checkpointing placement can result in either performance degradation or expensive recovery cost. By means of the calculus of variations, we derive an explicit formula that links...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on computers Vol. 50; no. 7; pp. 699 - 708
Main Authors: Ling, Yibei, Mi, Jie, Lin, Xiaola
Format: Journal Article
Language:English
Published: New York IEEE 01-07-2001
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Checkpointing is an effective fault-tolerant technique for improving system availability and reliability. However, a blind checkpointing placement can result in either performance degradation or expensive recovery cost. By means of the calculus of variations, we derive an explicit formula that links the optimal checkpointing frequency with a general failure rate, with the objective of globally minimizing the total expected cost of checkpointing and recovery. Theoretical result shows that the optimal checkpointing frequency is proportional to the square root of the failure rate and can be uniquely determined by the failure rate (time-varying or constant) if the recovery function is strictly increasing and the failure rate is /spl lambda/(/spl infin/)>0. J.L. Bruno and E.G. Coffman (1997) suggest that optimal checkpointing by its nature is a function of system failure rate, i.e., the time-varying failure rate demands time-varying checkpointing in order to meet the criteria of certain optimality. The results obtained in this paper agree with their viewpoint.
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ISSN:0018-9340
1557-9956
DOI:10.1109/12.936236