Porting the grid-based 3D+3V hybrid-Vlasov kinetic plasma simulation Vlasiator to heterogeneous GPU architectures

Vlasiator is a space plasma simulation code which models near-Earth ion-kinetic dynamics in three spatial and three velocity dimensions. It is highly parallelized, modeling the Vlasov equation directly through the distribution function, discretized on a Cartesian grid, instead of the more common par...

Full description

Saved in:

Bibliographic Details
Main Authors:	Battarbee, Markus, Papadakis, Konstantinos, Ganse, Urs, Hokkanen, Jaro, Kotipalo, Leo, Pfau-Kempf, Yann, Alho, Markku, Palmroth, Minna
Format:	Journal Article
Language:	English
Published:	04-06-2024
Subjects:	Physics - Computational Physics Physics - Plasma Physics Physics - Space Physics
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Vlasiator is a space plasma simulation code which models near-Earth ion-kinetic dynamics in three spatial and three velocity dimensions. It is highly parallelized, modeling the Vlasov equation directly through the distribution function, discretized on a Cartesian grid, instead of the more common particle-in-cell approach. Modeling near-Earth space, plasma properties span several orders of magnitude in temperature, density, and magnetic field strength. In order to fit the required six-dimensional grids in memory, Vlasiator utilizes a sparse block-based velocity mesh, where chunks of velocity space are added or deleted based on the advection requirements of the Vlasov solver. In addition, the spatial mesh is adaptively refined through cell-based octree refinement. In this paper, we describe the design choices of porting Vlasiator to heterogeneous CPU/GPU architectures. We detail the memory management, algorithmic changes, and kernel construction as well as our unified codebase approach, resulting in portability to both NVIDIA and AMD hardware (CUDA and HIP languages, respectively). In particular, we showcase a highly parallel block adjustment approach allowing efficient re-ordering of a sparse velocity mesh. We detail pitfalls we have overcome and lay out a plan for optimization to facilitate future exascale simulations using multi-node GPU supercomputing.
DOI:	10.48550/arxiv.2406.02201