On the Memory Underutilization: Exploring Disaggregated Memory on HPC Systems

Large-scale high-performance computing (HPC) systems consist of massive compute and memory resources tightly coupled in nodes. We perform a large-scale study of memory utilization on four production HPC clusters. Our results show that more than 90% of jobs utilize less than 15% of the node memory ca...

Full description

Saved in:
Bibliographic Details
Published in:2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) pp. 183 - 190
Main Authors: Peng, Ivy, Pearce, Roger, Gokhale, Maya
Format: Conference Proceeding
Language:English
Published: IEEE 01-09-2020
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Large-scale high-performance computing (HPC) systems consist of massive compute and memory resources tightly coupled in nodes. We perform a large-scale study of memory utilization on four production HPC clusters. Our results show that more than 90% of jobs utilize less than 15% of the node memory capacity, and for 90% of the time, memory utilization is less than 35%. Recently, disaggregated architecture is gaining traction because it can selectively scale up a resource and improve resource utilization. Based on these observations, we explore using disaggregated memory to support memory-intensive applications, while most jobs remain intact on HPC systems with reduced node memory. We designed and developed a user-space remote-memory paging library to enable applications exploring disaggregated memory on existing HPC clusters. We quantified the impact of access patterns and network connectivity in benchmarks. Our case studies of graph-processing and Monte-Carlo applications evaluated the impact of application characteristics and local memory capacity and highlighted the potential of throughput scaling on disaggregated memory.
ISSN:2643-3001
DOI:10.1109/SBAC-PAD49847.2020.00034