Search Results - "Hoefler, Torsten"

Refine Results
  1. 1

    Snitch: A Tiny Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads by Zaruba, Florian, Schuiki, Fabian, Hoefler, Torsten, Benini, Luca

    Published in IEEE transactions on computers (01-11-2021)
    “…Data-parallel applications, such as data analytics, machine learning, and scientific computing, are placing an ever-growing demand on floating-point operations…”
    Get full text
    Journal Article
  2. 2

    Transformations of High-Level Synthesis Codes for High-Performance Computing by de Fine Licht, Johannes, Besta, Maciej, Meierhans, Simon, Hoefler, Torsten

    “…Spatial computing architectures promise a major stride in performance and energy efficiency over the traditional load/store devices currently employed in large…”
    Get full text
    Journal Article
  3. 3

    Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue Cores by Schuiki, Fabian, Zaruba, Florian, Hoefler, Torsten, Benini, Luca

    Published in IEEE transactions on computers (01-02-2021)
    “…Single-issue processor cores are very energy efficient but suffer from the von Neumann bottleneck, in that they must explicitly fetch and issue the…”
    Get full text
    Journal Article
  4. 4

    Evaluating the Cost of Atomic Operations on Modern Architectures by Schweizer, Hermann, Besta, Maciej, Hoefler, Torsten

    “…Atomic operations (atomics) such as Compare-and-Swap (CAS) or Fetch-and-Add (FAA) are ubiquitous in parallel programming. Yet, performance tradeoffs between…”
    Get full text
    Conference Proceeding
  5. 5

    Myths and legends in high-performance computing by Matsuoka, Satoshi, Domke, Jens, Wahib, Mohamed, Drozd, Aleksandr, Hoefler, Torsten

    “…In this thought-provoking article, we discuss certain myths and legends that are folklore among members of the high-performance computing community. We…”
    Get full text
    Journal Article
  6. 6
  7. 7

    Reflecting on the Goal and Baseline for Exascale Computing: A Roadmap Based on Weather and Climate Simulations by Schulthess, Thomas C., Bauer, Peter, Wedi, Nils, Fuhrer, Oliver, Hoefler, Torsten, Schar, Christoph

    Published in Computing in science & engineering (01-01-2019)
    “…We present a roadmap towards exascale computing based on true application performance goals. It is based on two state-of-the art European numerical weather…”
    Get full text
    Journal Article
  8. 8

    Slim fly: a cost effective low-diameter network topology by Besta, Maciej, Hoefler, Torsten

    “…We introduce a high-performance cost-effective network topology called Slim Fly that approaches the theoretically optimal network diameter. Slim Fly is based…”
    Get full text
    Conference Proceeding
  9. 9

    Benchmarking Data Science: 12 Ways to Lie With Statistics and Performance on Parallel Computers by Hoefler, Torsten

    Published in Computer (Long Beach, Calif.) (01-08-2022)
    “…We humorously discuss 12 fallacies when focusing on compute performance that we have frequently observed in practice. We follow each with a recommendation to…”
    Get full text
    Journal Article
  10. 10

    Parallel and Distributed Graph Neural Networks: An In-Depth Concurrency Analysis by Besta, Maciej, Hoefler, Torsten

    “…Graph neural networks (GNNs) are among the most powerful tools in deep learning. They routinely solve complex problems on unstructured networks, such as node…”
    Get full text
    Journal Article
  11. 11

    Scaling betweenness centrality using communication-efficient sparse matrix multiplication by Solomonik, Edgar, Besta, Maciej, Vella, Flavio, Hoefler, Torsten

    “…Betweenness centrality (BC) is a crucial graph problem that measures the significance of a vertex by the number of shortest paths leading through it. We…”
    Get full text
    Conference Proceeding
  12. 12

    A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning by Ben-Nun, Tal, Besta, Maciej, Huber, Simon, Ziogas, Alexandros Nikolaos, Peter, Daniel, Hoefler, Torsten

    “…We introduce Deep500: the first customizable benchmarking infrastructure that enables fair comparison of the plethora of deep learning frameworks, algorithms,…”
    Get full text
    Conference Proceeding
  13. 13

    Communication-Efficient Jaccard similarity for High-Performance Distributed Genome Comparisons by Besta, Maciej, Kanakagiri, Raghavendra, Mustafa, Harun, Karasikov, Mikhail, Ratsch, Gunnar, Hoefler, Torsten, Solomonik, Edgar

    “…The Jaccard similarity index is an important measure of the overlap of two sets, widely used in machine learning, computational genomics, information…”
    Get full text
    Conference Proceeding
  14. 14

    An Efficient Algorithm for Sparse Quantum State Preparation by Gleinig, Niels, Hoefler, Torsten

    “…Generating quantum circuits that prepare specific states is an essential part of quantum compilation. Algorithms that solve this problem for general states…”
    Get full text
    Conference Proceeding
  15. 15

    Characterizing the Influence of System Noise on Large-Scale Applications by Simulation by Hoefler, Torsten, Schneider, Timo, Lumsdaine, Andrew

    “…This paper presents an in-depth analysis of the impact of system noise on large-scale parallel application performance in realistic settings. Our analytical…”
    Get full text
    Conference Proceeding
  16. 16

    Augment Your Batch: Improving Generalization Through Instance Repetition by Hoffer, Elad, Ben-Nun, Tal, Hubara, Itay, Giladi, Niv, Hoefler, Torsten, Soudry, Daniel

    “…Large-batch SGD is important for scaling training of deep neural networks. However, without fine-tuning hyperparameter schedules, the generalization of the…”
    Get full text
    Conference Proceeding
  17. 17

    FatPaths: Routing in Supercomputers and Data Centers when Shortest Paths Fall Short by Besta, Maciej, Schneider, Marcel, Konieczny, Marek, Cynk, Karolina, Henriksson, Erik, Girolamo, Salvatore Di, Singla, Ankit, Hoefler, Torsten

    “…We introduce FatPaths: a simple, generic, and robust routing architecture that enables state-of-the-art low-diameter topologies such as Slim Fly to achieve…”
    Get full text
    Conference Proceeding
  18. 18

    Cache Line Aware Algorithm Design for Cache-Coherent Architectures by Ramos, Sabela, Hoefler, Torsten

    “…The increase in the number of cores per processor and the complexity of memory hierarchies make cache coherence key for programmability of current shared…”
    Get full text
    Journal Article
  19. 19

    Deep learning for post-processing ensemble weather forecasts by Grönquist, Peter, Yao, Chengyuan, Ben-Nun, Tal, Dryden, Nikoli, Dueben, Peter, Li, Shigang, Hoefler, Torsten

    “…Quantifying uncertainty in weather forecasts is critical, especially for predicting extreme weather events. This is typically accomplished with ensemble…”
    Get full text
    Journal Article
  20. 20

    Practice of Streaming Processing of Dynamic Graphs: Concepts, Models, and Systems by Besta, Maciej, Fischer, Marc, Kalavri, Vasiliki, Kapralov, Michael, Hoefler, Torsten

    “…Graph processing has become an important part of various areas of computing, including machine learning, medical applications, social network analysis,…”
    Get full text
    Journal Article