Phoenix: Memory Speed HPC I/O with NVM
In order to bridge the gap between the applications' I/O needs on future exascale platforms, and thecapabilities of conventional memory and storage technologies, HPC system designs started integrating components based onemerging non-volatile memory technologies. Non-volatile memory (NVRAM) prov...
Saved in:
Published in: | 2016 IEEE 23rd International Conference on High Performance Computing (HiPC) pp. 121 - 131 |
---|---|
Main Authors: | , , , |
Format: | Conference Proceeding |
Language: | English |
Published: |
IEEE
01-12-2016
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In order to bridge the gap between the applications' I/O needs on future exascale platforms, and thecapabilities of conventional memory and storage technologies, HPC system designs started integrating components based onemerging non-volatile memory technologies. Non-volatile memory (NVRAM) provides persistent storage at close to memoryspeeds, with good capacity scaling, leading to opportunitiesto accelerate I/O in exascale machines. However, naive use ofNVRAM devices with current software stacks, exposes newbottlenecks due to the limited device bandwidth and slowerdevice access times compared to DRAM. To address this, we propose Phoenix (PHX), an NVRAM-bandwidth aware object store for persistent objects. PHXachieves efficiency through use of memory-centric objectinterfaces and device access stack specialized for NVRAM. Furthermore, PHX deals with the limited PCM bandwidththrough simultaneous use of NVRAM and DRAM devices, thus increasing the effective data movement bandwidth. Thisleads to reduction in the time length of the critical path I/Ooperations associated with the slow NVM device. To continueguaranteeing adequate reliability for the persistent objects, DRAM-resident object state is replicated across peer nodes'memory, accessible through high-bandwidth interconnects. Furthermore PHX minimizes the data movement overheads dueto additional data copies, by using a cost model that considersdevice bandwidths, remote storage distance and energy costs. Experimental analysis using real-world HPC applications onemulated NVRAM hardware shows that Phoenix's controlleduse of node-local and remote-node memory bandwidth, delivers up to ~ 1.2×, ~ 2× and ~ 12× speed-up for checkpoint I/Ofor the S3D, CM1 and GTC HPC applications, respectively. Furthermore PHX reduces total simulation checkpoint over-head of GTC up to ~ 18%. |
---|---|
DOI: | 10.1109/HiPC.2016.023 |