Smart predictive maintenance for high-performance computing systems: a literature review

Predictive maintenance is an invaluable tool to preserve the health of mission critical assets while minimizing the operational costs of scheduled intervention. Artificial intelligence techniques have been shown to be effective at treating large volumes of data, such as the ones collected by the sen...

Full description

Saved in:
Bibliographic Details
Published in:The Journal of supercomputing Vol. 77; no. 11; pp. 13494 - 13513
Main Authors: Lima, André Luis da Cunha Dantas, Aranha, Vitor Moraes, Carvalho, Caio Jordão de Lima, Nascimento, Erick Giovani Sperandio
Format: Journal Article
Language:English
Published: New York Springer US 01-11-2021
Springer Nature B.V
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Predictive maintenance is an invaluable tool to preserve the health of mission critical assets while minimizing the operational costs of scheduled intervention. Artificial intelligence techniques have been shown to be effective at treating large volumes of data, such as the ones collected by the sensors typically present in equipment. In this work, we aim to identify and summarize existing publications in the field of predictive maintenance that explore machine learning and deep learning algorithms to improve the performance of failure classification and detection. We show a significant upward trend in the use of deep learning methods of sensor data collected by mission critical assets for early failure detection to assist predictive maintenance schedules. We also identify aspects that require further investigation in future works, regarding exploration of life support systems for supercomputing assets and standardization of performance metrics.
ISSN:0920-8542
1573-0484
DOI:10.1007/s11227-021-03811-7