Autonomic Management of Large Clusters and Their Integration into the Grid

We present a framework for the co-ordinated, autonomic management of multiple clusters in a compute center and their integration into a Grid environment. Site autonomy and the automation of administrative tasks are prime aspects in this framework. The system behavior is continuously monitored in a s...

Full description

Saved in:
Bibliographic Details
Published in:Journal of grid computing Vol. 2; no. 3; pp. 247 - 260
Main Authors: Röblitz, Thomas, Schintke, Florian, Reinefeld, Alexander, Bärring, Olof, Barroso Lopez, Maite, Cancio, German, Chapeland, Sylvain, Chouikh, Karim, Cons, Lionel, Poznański, Piotr, Defert, Philippe, Iven, Jan, Kleinwort, Thorsten, Panzer-Steindel, Bernd, Polok, Jaroslaw, Rafflin, Catherine, Silverman, Alan, Smith, Tim, Eldik, Jan, Front, David, Biasotto, Massimo, Aiftimiei, Cristina, Ferro, Enrico, Maron, Gaetano, Chierici, Andrea, Dell’agnello, Luca, Serra, Marco, Michelotto, Michele, Hess, Lord, Lindenstruth, Volker, Pister, Frank, Morten Steinbeck, Timm, Groep, David, Steenbakkers, Martijn, Koeroo, Oscar, de Cerff, Wim Som, Venekamp, Gerben, Anderson, Paul, Colles, Tim, Holt, Alexander, Scobie, Alastair, George, Michael, Washbrook, Andrew, Leiva, Rafael A. García
Format: Journal Article
Language:English
Published: Dordrecht Springer Nature B.V 01-09-2004
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We present a framework for the co-ordinated, autonomic management of multiple clusters in a compute center and their integration into a Grid environment. Site autonomy and the automation of administrative tasks are prime aspects in this framework. The system behavior is continuously monitored in a steering cycle and appropriate actions are taken to resolve any problems.All presented components have been implemented in the course of the EU project DataGrid: The Lemon monitoring components, the FT fault-tolerance mechanism, the quattor system for software installation and configuration, the RMS job and resource management system, and the Gridification scheme that integrates clusters into the Grid.
ISSN:1570-7873
1572-9184
DOI:10.1007/s10723-004-7647-3