A Fast Scalable Implicit Solver with Concentrated Computation for Nonlinear Time-Evolution Problems on Low-Order Unstructured Finite Elements
Many supercomputers are shifting to architectures with low B (byte/s; memory transfer capability) per F (FLOPS capability) ratios. However, utilizing increased F is difficult for applications that inherently require large B. Targeting an implicit unstructured low-order finite-element analysis solver...
Saved in:
Published in: | 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS) pp. 620 - 629 |
---|---|
Main Authors: | , , , , , , , , , , , |
Format: | Conference Proceeding |
Language: | English |
Published: |
IEEE
01-05-2018
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Many supercomputers are shifting to architectures with low B (byte/s; memory transfer capability) per F (FLOPS capability) ratios. However, utilizing increased F is difficult for applications that inherently require large B. Targeting an implicit unstructured low-order finite-element analysis solver, which typically requires large B, we have developed a concentrated computation algorithm that yields significant performance improvements on low B/F supercomputers. 35.7% peak performance was achieved for a sparse matrix-vector multiplication kernel, and 15.6% peak performance was achieved for the whole solver on the second generation Xeon Phi-based Oakforest-PACS. This is 5.02 times faster than (and 6.90 times the peak performance of) the state-of-the-art solver (the SC14 Gordon Bell finalist solver). On Oakforest-PACS, the proposed solver was approximately 2.42 times faster than the state-of-the-art solver running on the K computer. The proposed approach has implications for systems and applications and is expected to have significant impact on various fields that use finite-element methods for nonlinear time evolution problems. |
---|---|
ISSN: | 1530-2075 |
DOI: | 10.1109/IPDPS.2018.00071 |