Long-Range MD Electrostatics Force Computation on FPGAs

Strong scaling of long-range electrostatic force computation, which is a central concern of long timescale molecular dynamics simulations, is challenging for CPUs and GPUs due to its complex communication structure and global communication requirements. The scalability challenge is seen especially i...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on parallel and distributed systems Vol. 35; no. 10; pp. 1690 - 1707
Main Authors: Bandara, Sahan, Ducimo, Anthony, Wu, Chunshu, Herbordt, Martin
Format: Journal Article
Language:English
Published: IEEE 01-10-2024
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Strong scaling of long-range electrostatic force computation, which is a central concern of long timescale molecular dynamics simulations, is challenging for CPUs and GPUs due to its complex communication structure and global communication requirements. The scalability challenge is seen especially in small simulations of tens to hundreds of thousands of atoms that are of interest to many important applications such as physics-driven drug discovery. FPGA clusters, with their direct, tightly coupled, low-latency interconnects, are able to address these requirements. For FPGA MD clusters to be effective, however, single device performance must also be competitive. In this work, we leverage the inherent benefits of FPGAs to implement a long-range electrostatic force computation architecture. We present an overall framework with numerous algorithmic, mapping, and architecture innovations, including a unified interleaved memory, a spatial scheduling algorithm, and a design for seamless integration with the larger MD system. We examine a number of alternative configurations based on different resource allocation strategies and user parameters. We show that the best configuration of this architecture, implemented on an Intel Agilex FPGA, can achieve <inline-formula><tex-math notation="LaTeX">2124 ns</tex-math> <mml:math><mml:mrow><mml:mn>2124</mml:mn><mml:mi>n</mml:mi><mml:mi>s</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="bandara-ieq1-3434347.gif"/> </inline-formula> and <inline-formula><tex-math notation="LaTeX">287 ns</tex-math> <mml:math><mml:mrow><mml:mn>287</mml:mn><mml:mi>n</mml:mi><mml:mi>s</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="bandara-ieq2-3434347.gif"/> </inline-formula> of simulated time per day of wall-clock time for the two molecular dynamics benchmarks DHFR and ApoA1; simulating 23K and 92K particles, respectively.
ISSN:1045-9219
1558-2183
DOI:10.1109/TPDS.2024.3434347