Search Results - "de Carvalho, Joao P. L."

1
Vectorizing divergent control flow with active-lane consolidation on long-vector architectures by Praharenka, Wyatt, Pankratz, David, De Carvalho, João P. L., Amiri, Ehsan, Amaral, José Nelson

Published in The Journal of supercomputing (01-07-2022)
“…Control-flow divergence limits the applicability of loop vectorization, an important code-transformation that accelerates data-parallel loops. Control-flow…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
2
Compiling for the IBM Matrix Engine for Enterprise Workloads by de Carvalho, Joao P. L., Moreira, Jose E., Amaral, Jose Nelson

Published in IEEE MICRO (01-09-2022)
“…The matrix-multiply assist (MMA) facility is the latest addition to IBM’s power instruction set architecture and first shipped in the recently introduced…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
3
The Case for Phase-Based Transactional Memory by de Carvalho, Joao P. L., Araujo, Guido, Baldassin, Alexandro

Published in IEEE transactions on parallel and distributed systems (01-02-2019)
“…In recent years, Hybrid TM (HyTM) has been proposed as a transactional memory approach that leverages on the advantages of both hardware (HTM) and software…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
4
DASS: Dynamic Adaptive Sub-Target Specialization by Gobran, Tyler, de Carvalho, Joao P. L., Barton, Christopher

Published in 2023 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW) (17-10-2023)
“…A new microprocessor within a given processor architecture may introduce performance-improving features that either can only be accessed through novel…”

Get full text

Conference Proceeding
QR Code
Save to List

Saved in:
5
Fast matrix multiplication via compiler‐only layered data reorganization and intrinsic lowering by Kuzma, Braedy, Korostelev, Ivan, Carvalho, João P. L., Moreira, José E., Barton, Christopher, Araujo, Guido, Amaral, José Nelson

Published in Software, practice & experience (01-09-2023)
“…The resurgence of machine learning has increased the demand for high‐performance basic linear algebra subroutines (BLAS), which have long depended on libraries…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
6
Pooling Acceleration in the DaVinci Architecture Using Im2col and Col2im Instructions by Rohwedder, Caio S., de Carvalho, Joao P. L., Amaral, Jose Nelson, Araujo, Guido, Colmenares, Giancarlo, Wang, Kai-Ting Amy

Published in 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) (01-06-2021)
“…Image-to-column (Im2col) and column-to-image (Col2im) are data transformations extensively used to map convolution to matrix multiplication. These…”

Get full text

Conference Proceeding
QR Code
Save to List

Saved in:
7
Improving Transactional Code Generation via Variable Annotation and Barrier Elision by de Carvalho, Joao P. L., Honorio, Bruno C., Baldassin, Alexandro, Araujo, Guido

Published in 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (01-05-2020)
“…With chip manufacturers such as Intel, IBM and ARM offering native support for transactional memory in their instruction set architectures, memory transactions…”

Get full text

Conference Proceeding
QR Code
Save to List

Saved in:
8
On the impact of mode transition on phased transactional memory performance by Munoz Morales, Catalina, Honorio, Bruno, de Carvalho, Joao P.L., Baldassin, Alexandro, Araujo, Guido

Published in Journal of parallel and distributed computing (01-03-2023)
“…Several transactional memory implementations that employ state-of-the-art software and hardware techniques to deliver performance have been investigated in the…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
9
An efficient parallel implementation for training supervised optimum-path forest classifiers by Culquicondor, Aldo, Baldassin, Alexandro, Castelo-Fernández, Cesar, de Carvalho, João P.L., Papa, João Paulo

Published in Neurocomputing (Amsterdam) (14-06-2020)
“…In this work, we propose and analyze parallel training algorithms for the Optimum-Path Forest (OPF) classifier. We start with a naïve parallelization approach…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
10
Fast Matrix Multiplication via Compiler-only Layered Data Reorganization and Intrinsic Lowering by Kuzma, Braedy, Korostelev, Ivan, de Carvalho, João P. L, Moreira, José E, Barton, Christopher, Araujo, Guido, Amaral, José Nelson

Published 15-05-2023
“…The resurgence of machine learning has increased the demand for high-performance basic linear algebra subroutines (BLAS), which have long depended on libraries…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
11
DOACROSS Parallelization Based on Component Annotation and Loop-Carried Probability by Mattos, Luis, Cesar, Divino, Salamanca, Juan, de Carvalho, Joao P. L., Pereira, Marcio, Araujo, Guido

Published in 2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) (01-09-2018)
“…Although modern compilers implement many loop parallelization techniques, their application is typically restricted to loops that have no loop-carried…”

Get full text

Conference Proceeding
QR Code
Save to List

Saved in:
12
On the Efficiency of Transactional Code Generation: A GCC Case Study by Honorio, Bruno Chinelato, Labegalini de Carvalho, Joao Paulo, Baldassin, Alexandro Jose

Published in 2018 Symposium on High Performance Computing Systems (WSCAD) (01-10-2018)
“…Memory transactions are becoming more popular as chip manufacturers are building native support for their execution. Although current Intel and IBM…”

Get full text

Conference Proceeding
QR Code
Save to List

Saved in:
13
Advancing Direct Convolution using Convolution Slicing Optimization and ISA Extensions by Ferrari, Victor, Sousa, Rafael, Pereira, Marcio, de Carvalho, João P. L, Amaral, José Nelson, Moreira, José, Araujo, Guido

Published 08-03-2023
“…Convolution is one of the most computationally intensive operations that must be performed for machine-learning model inference. A traditional approach to…”

Get full text

Journal Article
QR Code
Save to List

Saved in:

Search Results - "de Carvalho, Joao P. L."

Vectorizing divergent control flow with active-lane consolidation on long-vector architectures by Praharenka, Wyatt, Pankratz, David, De Carvalho, João P. L., Amiri, Ehsan, Amaral, José Nelson

Compiling for the IBM Matrix Engine for Enterprise Workloads by de Carvalho, Joao P. L., Moreira, Jose E., Amaral, Jose Nelson

The Case for Phase-Based Transactional Memory by de Carvalho, Joao P. L., Araujo, Guido, Baldassin, Alexandro

DASS: Dynamic Adaptive Sub-Target Specialization by Gobran, Tyler, de Carvalho, Joao P. L., Barton, Christopher

Fast matrix multiplication via compiler‐only layered data reorganization and intrinsic lowering by Kuzma, Braedy, Korostelev, Ivan, Carvalho, João P. L., Moreira, José E., Barton, Christopher, Araujo, Guido, Amaral, José Nelson

Pooling Acceleration in the DaVinci Architecture Using Im2col and Col2im Instructions by Rohwedder, Caio S., de Carvalho, Joao P. L., Amaral, Jose Nelson, Araujo, Guido, Colmenares, Giancarlo, Wang, Kai-Ting Amy

Improving Transactional Code Generation via Variable Annotation and Barrier Elision by de Carvalho, Joao P. L., Honorio, Bruno C., Baldassin, Alexandro, Araujo, Guido

On the impact of mode transition on phased transactional memory performance by Munoz Morales, Catalina, Honorio, Bruno, de Carvalho, Joao P.L., Baldassin, Alexandro, Araujo, Guido

An efficient parallel implementation for training supervised optimum-path forest classifiers by Culquicondor, Aldo, Baldassin, Alexandro, Castelo-Fernández, Cesar, de Carvalho, João P.L., Papa, João Paulo

Fast Matrix Multiplication via Compiler-only Layered Data Reorganization and Intrinsic Lowering by Kuzma, Braedy, Korostelev, Ivan, de Carvalho, João P. L, Moreira, José E, Barton, Christopher, Araujo, Guido, Amaral, José Nelson

DOACROSS Parallelization Based on Component Annotation and Loop-Carried Probability by Mattos, Luis, Cesar, Divino, Salamanca, Juan, de Carvalho, Joao P. L., Pereira, Marcio, Araujo, Guido

On the Efficiency of Transactional Code Generation: A GCC Case Study by Honorio, Bruno Chinelato, Labegalini de Carvalho, Joao Paulo, Baldassin, Alexandro Jose

Advancing Direct Convolution using Convolution Slicing Optimization and ISA Extensions by Ferrari, Victor, Sousa, Rafael, Pereira, Marcio, de Carvalho, João P. L, Amaral, José Nelson, Moreira, José, Araujo, Guido

Search Tools:

Refine Results

Format

Subject Area

Topic

Language

Year of Publication