Search Results - "Hoefler, Torsten"
-
1
Snitch: A Tiny Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads
Published in IEEE transactions on computers (01-11-2021)“…Data-parallel applications, such as data analytics, machine learning, and scientific computing, are placing an ever-growing demand on floating-point operations…”
Get full text
Journal Article -
2
Transformations of High-Level Synthesis Codes for High-Performance Computing
Published in IEEE transactions on parallel and distributed systems (01-05-2021)“…Spatial computing architectures promise a major stride in performance and energy efficiency over the traditional load/store devices currently employed in large…”
Get full text
Journal Article -
3
Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue Cores
Published in IEEE transactions on computers (01-02-2021)“…Single-issue processor cores are very energy efficient but suffer from the von Neumann bottleneck, in that they must explicitly fetch and issue the…”
Get full text
Journal Article -
4
Evaluating the Cost of Atomic Operations on Modern Architectures
Published in 2015 International Conference on Parallel Architecture and Compilation (PACT) (01-10-2015)“…Atomic operations (atomics) such as Compare-and-Swap (CAS) or Fetch-and-Add (FAA) are ubiquitous in parallel programming. Yet, performance tradeoffs between…”
Get full text
Conference Proceeding -
5
Myths and legends in high-performance computing
Published in The international journal of high performance computing applications (01-07-2023)“…In this thought-provoking article, we discuss certain myths and legends that are folklore among members of the high-performance computing community. We…”
Get full text
Journal Article -
6
Kilometer-Scale Climate Models: Prospects and Challenges
Published in Bulletin of the American Meteorological Society (01-05-2020)“…Currently major efforts are underway toward refining the horizontal resolution (or grid spacing) of climate models to about 1 km, using both global and…”
Get full text
Journal Article -
7
Reflecting on the Goal and Baseline for Exascale Computing: A Roadmap Based on Weather and Climate Simulations
Published in Computing in science & engineering (01-01-2019)“…We present a roadmap towards exascale computing based on true application performance goals. It is based on two state-of-the art European numerical weather…”
Get full text
Journal Article -
8
Slim fly: a cost effective low-diameter network topology
Published in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (01-11-2014)“…We introduce a high-performance cost-effective network topology called Slim Fly that approaches the theoretically optimal network diameter. Slim Fly is based…”
Get full text
Conference Proceeding -
9
Benchmarking Data Science: 12 Ways to Lie With Statistics and Performance on Parallel Computers
Published in Computer (Long Beach, Calif.) (01-08-2022)“…We humorously discuss 12 fallacies when focusing on compute performance that we have frequently observed in practice. We follow each with a recommendation to…”
Get full text
Journal Article -
10
Parallel and Distributed Graph Neural Networks: An In-Depth Concurrency Analysis
Published in IEEE transactions on pattern analysis and machine intelligence (01-05-2024)“…Graph neural networks (GNNs) are among the most powerful tools in deep learning. They routinely solve complex problems on unstructured networks, such as node…”
Get full text
Journal Article -
11
Scaling betweenness centrality using communication-efficient sparse matrix multiplication
Published in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (12-11-2017)“…Betweenness centrality (BC) is a crucial graph problem that measures the significance of a vertex by the number of shortest paths leading through it. We…”
Get full text
Conference Proceeding -
12
A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning
Published in 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (01-05-2019)“…We introduce Deep500: the first customizable benchmarking infrastructure that enables fair comparison of the plethora of deep learning frameworks, algorithms,…”
Get full text
Conference Proceeding -
13
Communication-Efficient Jaccard similarity for High-Performance Distributed Genome Comparisons
Published in 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (01-05-2020)“…The Jaccard similarity index is an important measure of the overlap of two sets, widely used in machine learning, computational genomics, information…”
Get full text
Conference Proceeding -
14
An Efficient Algorithm for Sparse Quantum State Preparation
Published in 2021 58th ACM/IEEE Design Automation Conference (DAC) (05-12-2021)“…Generating quantum circuits that prepare specific states is an essential part of quantum compilation. Algorithms that solve this problem for general states…”
Get full text
Conference Proceeding -
15
Characterizing the Influence of System Noise on Large-Scale Applications by Simulation
Published in 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (01-11-2010)“…This paper presents an in-depth analysis of the impact of system noise on large-scale parallel application performance in realistic settings. Our analytical…”
Get full text
Conference Proceeding -
16
Augment Your Batch: Improving Generalization Through Instance Repetition
Published in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (01-06-2020)“…Large-batch SGD is important for scaling training of deep neural networks. However, without fine-tuning hyperparameter schedules, the generalization of the…”
Get full text
Conference Proceeding -
17
FatPaths: Routing in Supercomputers and Data Centers when Shortest Paths Fall Short
Published in SC20: International Conference for High Performance Computing, Networking, Storage and Analysis (01-11-2020)“…We introduce FatPaths: a simple, generic, and robust routing architecture that enables state-of-the-art low-diameter topologies such as Slim Fly to achieve…”
Get full text
Conference Proceeding -
18
Cache Line Aware Algorithm Design for Cache-Coherent Architectures
Published in IEEE transactions on parallel and distributed systems (01-10-2016)“…The increase in the number of cores per processor and the complexity of memory hierarchies make cache coherence key for programmability of current shared…”
Get full text
Journal Article -
19
Deep learning for post-processing ensemble weather forecasts
Published in Philosophical transactions of the Royal Society of London. Series A: Mathematical, physical, and engineering sciences (05-04-2021)“…Quantifying uncertainty in weather forecasts is critical, especially for predicting extreme weather events. This is typically accomplished with ensemble…”
Get full text
Journal Article -
20
Practice of Streaming Processing of Dynamic Graphs: Concepts, Models, and Systems
Published in IEEE transactions on parallel and distributed systems (01-06-2023)“…Graph processing has become an important part of various areas of computing, including machine learning, medical applications, social network analysis,…”
Get full text
Journal Article