A Quantitative Approach for Adopting Disaggregated Memory in HPC Systems

Memory disaggregation has recently been adopted in data centers to improve resource utilization, motivated by cost and sustainability. Recent studies on large-scale HPC facilities have also highlighted memory underutilization. A promising and non-disruptive option for memory disaggregation is rack-s...

Full description

Saved in:

Bibliographic Details
Published in:	SC23: International Conference for High Performance Computing, Networking, Storage and Analysis pp. 1 - 14
Main Authors:	Wahlgren, Jacob, Schieffer, Gabin, Gokhale, Maya, Peng, Ivy
Format:	Conference Proceeding
Language:	English
Published:	ACM 11-11-2023
Subjects:	Costs Data centers disaggregated memory High performance computing HPC system Interference Memory management multi-tier memory Prefetching Processor scheduling
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Memory disaggregation has recently been adopted in data centers to improve resource utilization, motivated by cost and sustainability. Recent studies on large-scale HPC facilities have also highlighted memory underutilization. A promising and non-disruptive option for memory disaggregation is rack-scale memory pooling, where node-local memory is supplemented by shared memory pools. This work outlines the prospects and requirements for adoption and clarifies several misconceptions. We propose a quantitative method for dissecting application requirements on the memory system from the top down in three levels, moving from general, to multi-tier memory systems, and then to memory pooling. We provide a multi-level profiling tool and LBench to facilitate the quantitative approach. We evaluate a set of representative HPC workloads on an emulated platform. Our results show that prefetching activities can significantly influence memory traffic profiles. Interference in memory pooling has varied impacts on applications, depending on their access ratios to memory tiers and arithmetic intensities. Finally, in two case studies, we show the benefits of our findings at the application and system levels, achieving 50% reduction in remote access and 13% speedup in BFS, and reducing performance variation of co-located workloads in interference-aware job scheduling.
ISSN:	2167-4337
DOI:	10.1145/3581784.3607108