Search Results - "de Carvalho, Joao P. L."
-
1
Vectorizing divergent control flow with active-lane consolidation on long-vector architectures
Published in The Journal of supercomputing (01-07-2022)“…Control-flow divergence limits the applicability of loop vectorization, an important code-transformation that accelerates data-parallel loops. Control-flow…”
Get full text
Journal Article -
2
Compiling for the IBM Matrix Engine for Enterprise Workloads
Published in IEEE MICRO (01-09-2022)“…The matrix-multiply assist (MMA) facility is the latest addition to IBM’s power instruction set architecture and first shipped in the recently introduced…”
Get full text
Journal Article -
3
The Case for Phase-Based Transactional Memory
Published in IEEE transactions on parallel and distributed systems (01-02-2019)“…In recent years, Hybrid TM (HyTM) has been proposed as a transactional memory approach that leverages on the advantages of both hardware (HTM) and software…”
Get full text
Journal Article -
4
DASS: Dynamic Adaptive Sub-Target Specialization
Published in 2023 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW) (17-10-2023)“…A new microprocessor within a given processor architecture may introduce performance-improving features that either can only be accessed through novel…”
Get full text
Conference Proceeding -
5
Fast matrix multiplication via compiler‐only layered data reorganization and intrinsic lowering
Published in Software, practice & experience (01-09-2023)“…The resurgence of machine learning has increased the demand for high‐performance basic linear algebra subroutines (BLAS), which have long depended on libraries…”
Get full text
Journal Article -
6
Pooling Acceleration in the DaVinci Architecture Using Im2col and Col2im Instructions
Published in 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) (01-06-2021)“…Image-to-column (Im2col) and column-to-image (Col2im) are data transformations extensively used to map convolution to matrix multiplication. These…”
Get full text
Conference Proceeding -
7
Improving Transactional Code Generation via Variable Annotation and Barrier Elision
Published in 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (01-05-2020)“…With chip manufacturers such as Intel, IBM and ARM offering native support for transactional memory in their instruction set architectures, memory transactions…”
Get full text
Conference Proceeding -
8
On the impact of mode transition on phased transactional memory performance
Published in Journal of parallel and distributed computing (01-03-2023)“…Several transactional memory implementations that employ state-of-the-art software and hardware techniques to deliver performance have been investigated in the…”
Get full text
Journal Article -
9
An efficient parallel implementation for training supervised optimum-path forest classifiers
Published in Neurocomputing (Amsterdam) (14-06-2020)“…In this work, we propose and analyze parallel training algorithms for the Optimum-Path Forest (OPF) classifier. We start with a naïve parallelization approach…”
Get full text
Journal Article -
10
Fast Matrix Multiplication via Compiler-only Layered Data Reorganization and Intrinsic Lowering
Published 15-05-2023“…The resurgence of machine learning has increased the demand for high-performance basic linear algebra subroutines (BLAS), which have long depended on libraries…”
Get full text
Journal Article -
11
DOACROSS Parallelization Based on Component Annotation and Loop-Carried Probability
Published in 2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) (01-09-2018)“…Although modern compilers implement many loop parallelization techniques, their application is typically restricted to loops that have no loop-carried…”
Get full text
Conference Proceeding -
12
On the Efficiency of Transactional Code Generation: A GCC Case Study
Published in 2018 Symposium on High Performance Computing Systems (WSCAD) (01-10-2018)“…Memory transactions are becoming more popular as chip manufacturers are building native support for their execution. Although current Intel and IBM…”
Get full text
Conference Proceeding -
13
Advancing Direct Convolution using Convolution Slicing Optimization and ISA Extensions
Published 08-03-2023“…Convolution is one of the most computationally intensive operations that must be performed for machine-learning model inference. A traditional approach to…”
Get full text
Journal Article