Search Results - "Hager, Georg"
-
1
CRAFT: A Library for Easier Application-Level Checkpoint/Restart and Automatic Fault Tolerance
Published in IEEE transactions on parallel and distributed systems (01-03-2019)“…In order to efficiently use the future generations of supercomputers, fault tolerance and power consumption are two of the prime challenges anticipated by the…”
Get full text
Journal Article -
2
LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments
Published in 2010 39th International Conference on Parallel Processing Workshops (01-09-2010)“…Exploiting the performance of today's processors requires intimate knowledge of the microarchitecture as well as an awareness of the ever-growing complexity in…”
Get full text
Conference Proceeding -
3
High-performance implementation of Chebyshev filter diagonalization for interior eigenvalue computations
Published in Journal of computational physics (15-11-2016)“…We study Chebyshev filter diagonalization as a tool for the computation of many interior eigenvalues of very large sparse symmetric matrices. In this technique…”
Get full text
Journal Article -
4
Comparison of different propagation steps for lattice Boltzmann methods
Published in Computers & mathematics with applications (1987) (01-03-2013)“…Several possibilities exist to implement the propagation step of lattice Boltzmann methods. This paper describes common implementations and compares the number…”
Get full text
Journal Article -
5
A flexible Patch-based lattice Boltzmann parallelization approach for heterogeneous GPU–CPU clusters
Published in Parallel computing (01-09-2011)“…► We investigate performance and scaling behavior of a LBM solver on GPU–CPU clusters. ► Based on hardware models performance estimations for GPUs and CPUs are…”
Get full text
Journal Article -
6
Pushing the limits for medical image reconstruction on recent standard multicore processors
Published in The international journal of high performance computing applications (01-05-2013)“…Volume reconstruction by backprojection is the computational bottleneck in many interventional clinical computed tomography (CT) applications. Today vendors in…”
Get full text
Journal Article -
7
Benefits from using mixed precision computations in the ELPA-AEO and ESSEX-II eigensolver projects
Published in Japan journal of industrial and applied mathematics (01-07-2019)“…We first briefly report on the status and recent achievements of the ELPA-AEO (Eigen value Solvers for Petaflop Applications—Algorithmic Extensions and…”
Get full text
Journal Article -
8
The Role of Idle Waves, Desynchronization, and Bottleneck Evasion in the Performance of Parallel Programs
Published in IEEE transactions on parallel and distributed systems (01-02-2023)“…The performance of highly parallel applications on distributed-memory systems is influenced by many factors. Analytic performance modeling techniques aim to…”
Get full text
Journal Article -
9
Analytic performance model for parallel overlapping memory‐bound kernels
Published in Concurrency and computation (01-05-2022)“…Complex applications running on multicore processors show a rich performance phenomenology. The growing number of cores per ccNUMA domain complicates…”
Get full text
Journal Article -
10
Level-Based Blocking for Sparse Matrices: Sparse Matrix-Power-Vector Multiplication
Published in IEEE transactions on parallel and distributed systems (01-02-2023)“…The multiplication of a sparse matrix with a dense vector (SpMV) is a key component in many numerical schemes and its performance is known to be severely…”
Get full text
Journal Article -
11
Performance Modeling of Streaming Kernels and Sparse Matrix-Vector Multiplication on A64FX
Published in 2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS) (01-11-2020)“…The A64FX CPU powers the current #1 supercomputer on the Top500 list. Although it is a traditional cache-based multicore processor, its peak performance and…”
Get full text
Conference Proceeding -
12
Making applications faster by asynchronous execution: Slowing down processes or relaxing MPI collectives
Published in Future generation computer systems (01-11-2023)“…Comprehending the performance bottlenecks at the core of the intricate hardware–software interactions exhibited by highly parallel programs on HPC clusters is…”
Get full text
Journal Article -
13
Analytical performance estimation during code generation on modern GPUs
Published in Journal of parallel and distributed computing (01-03-2023)“…Automatic code generation is frequently used to create implementations of algorithms specifically tuned to particular hardware and application parameters. The…”
Get full text
Journal Article -
14
Exploring performance and power properties of modern multi-core chips via simple machine models
Published in Concurrency and computation (01-02-2016)“…Summary Modern multi‐core chips show complex behavior with respect to performance and power. Starting with the Intel Sandy Bridge processor, it has become…”
Get full text
Journal Article -
15
Algebraic temporal blocking for sparse iterative solvers on multi-core CPUs
Published in The international journal of high performance computing applications (25-09-2024)“…Sparse linear iterative solvers are essential for many large-scale simulations. Much of the runtime of these solvers is often spent in the implicit evaluation…”
Get full text
Journal Article -
16
Performance engineering for real and complex tall & skinny matrix multiplication kernels on GPUs
Published in The international journal of high performance computing applications (01-01-2021)“…General matrix-matrix multiplications with double-precision real and complex entries (DGEMM and ZGEMM) in vendor-supplied BLAS libraries are best optimized for…”
Get full text
Journal Article -
17
Execution‐Cache‐Memory modeling and performance tuning of sparse matrix‐vector multiplication and Lattice quantum chromodynamics on A64FX
Published in Concurrency and computation (10-09-2022)“…The A64FX CPU is arguably the most powerful Arm‐based processor design to date. Although it is a traditional cache‐based multicore processor, its peak…”
Get full text
Journal Article -
18
Electron confinement in graphene with gate-defined quantum dots
Published in Physica Status Solidi. B: Basic Solid State Physics (01-08-2015)“…We theoretically analyse the possibility to electrostatically confine electrons in circular quantum dot arrays, impressed on contacted graphene nanoribbons by…”
Get full text
Journal Article -
19
A domain-specific language and matrix-free stencil code for investigating electronic properties of Dirac and topological materials
Published in The international journal of high performance computing applications (01-01-2021)“…We introduce PVSC-DTM (Parallel Vectorized Stencil Code for Dirac and Topological Materials), a library and code generator based on a domain-specific language…”
Get full text
Journal Article -
20
Chip-level and multi-node analysis of energy-optimized lattice Boltzmann CFD simulations
Published in Concurrency and computation (01-05-2016)“…Summary Memory‐bound algorithms show complex performance and energy consumption behavior on multicore processors. We choose the lattice Boltzmann method on an…”
Get full text
Journal Article