Search Results - "Kwasniewski, Grzegorz"
-
1
Using Compiler Techniques to Improve Automatic Performance Modeling
Published in 2015 International Conference on Parallel Architecture and Compilation (PACT) (01-10-2015)“…Performance modeling can be utilized in a number of scenarios, starting from finding performance bugs to the scalability study of applications. Existing…”
Get full text
Conference Proceeding -
2
Near-global climate simulation at 1 km resolution: establishing a performance baseline on 4888 GPUs with COSMO 5.0
Published in Geoscientific Model Development (02-05-2018)“…The best hope for reducing long-standing global climate model biases is by increasing resolution to the kilometer scale. Here we present results from an…”
Get full text
Journal Article -
3
High Performance Unstructured SpMM Computation Using Tensor Cores
Published 21-08-2024“…High-performance sparse matrix-matrix (SpMM) multiplication is paramount for science and industry, as the ever-increasing sizes of data prohibit using dense…”
Get full text
Journal Article -
4
Deinsum: Practically I/O Optimal Multi-Linear Algebra
Published in SC22: International Conference for High Performance Computing, Networking, Storage and Analysis (01-11-2022)“…Multilinear algebra kernel performance on modern massively-parallel systems is determined mainly by data movement. However, deriving data movement-optimal…”
Get full text
Conference Proceeding -
5
SeBS: A Serverless Benchmark Suite for Function-as-a-Service Computing
Published 28-12-2020“…Function-as-a-Service (FaaS) is one of the most promising directions for the future of cloud services, and serverless functions have immediately become a new…”
Get full text
Journal Article -
6
Deinsum: Practically I/O Optimal Multilinear Algebra
Published 16-06-2022“…Multilinear algebra kernel performance on modern massively-parallel systems is determined mainly by data movement. However, deriving data movement-optimal…”
Get full text
Journal Article -
7
On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal Matrix Factorizations
Published 25-04-2023“…Published at Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, November, 2021(SC'21) Matrix…”
Get full text
Journal Article -
8
ProbGraph: High-Performance and High-Accuracy Graph Mining with Probabilistic Set Representations
Published in SC22: International Conference for High Performance Computing, Networking, Storage and Analysis (01-11-2022)“…Important graph mining problems such as Clustering are computationally demanding. To significantly accelerate these problems, we propose ProbGraph: a graph…”
Get full text
Conference Proceeding -
9
Flexible Communication Avoiding Matrix Multiplication on FPGA with High-Level Synthesis
Published 25-01-2021“…In Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA'20), February 23-25, 2020, Seaside, CA, USA Data movement…”
Get full text
Journal Article -
10
On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal Matrix Factorizations
Published in SC21: International Conference for High Performance Computing, Networking, Storage and Analysis (14-11-2021)“…Matrix factorizations are among the most important building blocks of scientific computing. However, state-of-the-art libraries are not communication-optimal,…”
Get full text
Conference Proceeding -
11
On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal LU Factorization
Published 12-10-2020“…Dense linear algebra kernels, such as linear solvers or tensor contractions, are fundamental components of many scientific computing applications. In this…”
Get full text
Journal Article -
12
Lifting C Semantics for Dataflow Optimization
Published 24-05-2022“…C is the lingua franca of programming and almost any device can be programmed using C. However, programming mod-ern heterogeneous architectures such as…”
Get full text
Journal Article -
13
Pebbles, Graphs, and a Pinch of Combinatorics: Towards Tight I/O Lower Bounds for Statically Analyzable Programs
Published 15-05-2021“…Determining I/O lower bounds is a crucial step in obtaining communication-efficient parallel algorithms, both across the memory hierarchy and between…”
Get full text
Journal Article -
14
High-Performance and Programmable Attentional Graph Neural Networks with Global Tensor Formulations
Published in SC23: International Conference for High Performance Computing, Networking, Storage and Analysis (11-11-2023)“…Graph attention models (A-GNNs), a type of Graph Neural Networks (GNNs), have been shown to be more powerful than simpler convolutional GNNs (C-GNNs). However,…”
Get full text
Conference Proceeding -
15
ProbGraph: High-Performance and High-Accuracy Graph Mining with Probabilistic Set Representations
Published 24-08-2022“…Proceedings of the ACM/IEEE International Conference on High Performance Computing, Networking, Storage and Analysis, November 2022 Important graph mining…”
Get full text
Journal Article -
16
A scalable weakly-synchronous algorithm for solving partial differential equations
Published 13-11-2019“…Synchronization overheads pose a major challenge as applications advance towards extreme scales. In current large-scale algorithms, synchronization as well as…”
Get full text
Journal Article -
17
Red-blue pebbling revisited: near optimal parallel matrix-matrix multiplication
Published 26-08-2019“…We propose COSMA: a parallel matrix-matrix multiplication algorithm that is near communication-optimal for all combinations of matrix dimensions, processor…”
Get full text
Journal Article -
18
Motif Prediction with Graph Neural Networks
Published 26-05-2021“…Proceedings of the 28th SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'22), 2022 Link prediction is one of the central problems in graph mining…”
Get full text
Journal Article -
19
A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers
Published in SC16: International Conference for High Performance Computing, Networking, Storage and Analysis (01-11-2016)“…MeteoSwiss, the Swiss national weather forecast institute, has selected densely populated accelerator servers as their primary system to compute weather…”
Get full text
Conference Proceeding -
20
SISA: Set-Centric Instruction Set Architecture for Graph Mining on Processing-in-Memory Systems
Published 15-04-2021“…Simple graph algorithms such as PageRank have been the target of numerous hardware accelerators. Yet, there also exist much more complex graph mining…”
Get full text
Journal Article