A network-failure-tolerant message-passing system for terascale clusters

The Los Alamos Message Passing Interface (LA-MPI) is an end-to-end network-failure-tolerant message-passing system designed for terascale clusters. LA-MPI is a standard-compliant implementation of MPI designed to tolerate network-related failures including I/O bus errors, network card errors, and wi...

Full description

Saved in:
Bibliographic Details
Published in:International journal of parallel programming Vol. 31; no. 4; pp. 285 - 303
Main Authors: Graham, Richard L, Choi, Sung-eun, Daniel, David J, Desai, Nehal N, Minnich, Ronald G, Rasmussen, Craig E, Risinger, L Dean, Sukalski, Mitchel W
Format: Journal Article
Language:English
Published: New York Springer Nature B.V 01-08-2003
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The Los Alamos Message Passing Interface (LA-MPI) is an end-to-end network-failure-tolerant message-passing system designed for terascale clusters. LA-MPI is a standard-compliant implementation of MPI designed to tolerate network-related failures including I/O bus errors, network card errors, and wire-transmission errors. This paper details the distinguishing features of LA-MPI, including support for concurrent use of multiple types of network interface, and reliable message transmission utilizing multiple network paths and routes between a given source and destination. In addition, performance measurements on production-grade platforms are presented. [PUBLICATION ABSTRACT]
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ISSN:0885-7458
1573-7640
DOI:10.1023/A:1024504726988