Search Results - "de Carvalho, Joao P. L."

  • Showing 1 - 13 results of 13
Refine Results
  1. 1

    Vectorizing divergent control flow with active-lane consolidation on long-vector architectures by Praharenka, Wyatt, Pankratz, David, De Carvalho, João P. L., Amiri, Ehsan, Amaral, José Nelson

    Published in The Journal of supercomputing (01-07-2022)
    “…Control-flow divergence limits the applicability of loop vectorization, an important code-transformation that accelerates data-parallel loops. Control-flow…”
    Get full text
    Journal Article
  2. 2

    Compiling for the IBM Matrix Engine for Enterprise Workloads by de Carvalho, Joao P. L., Moreira, Jose E., Amaral, Jose Nelson

    Published in IEEE MICRO (01-09-2022)
    “…The matrix-multiply assist (MMA) facility is the latest addition to IBM’s power instruction set architecture and first shipped in the recently introduced…”
    Get full text
    Journal Article
  3. 3

    The Case for Phase-Based Transactional Memory by de Carvalho, Joao P. L., Araujo, Guido, Baldassin, Alexandro

    “…In recent years, Hybrid TM (HyTM) has been proposed as a transactional memory approach that leverages on the advantages of both hardware (HTM) and software…”
    Get full text
    Journal Article
  4. 4

    DASS: Dynamic Adaptive Sub-Target Specialization by Gobran, Tyler, de Carvalho, Joao P. L., Barton, Christopher

    “…A new microprocessor within a given processor architecture may introduce performance-improving features that either can only be accessed through novel…”
    Get full text
    Conference Proceeding
  5. 5

    Fast matrix multiplication via compiler‐only layered data reorganization and intrinsic lowering by Kuzma, Braedy, Korostelev, Ivan, Carvalho, João P. L., Moreira, José E., Barton, Christopher, Araujo, Guido, Amaral, José Nelson

    Published in Software, practice & experience (01-09-2023)
    “…The resurgence of machine learning has increased the demand for high‐performance basic linear algebra subroutines (BLAS), which have long depended on libraries…”
    Get full text
    Journal Article
  6. 6

    Pooling Acceleration in the DaVinci Architecture Using Im2col and Col2im Instructions by Rohwedder, Caio S., de Carvalho, Joao P. L., Amaral, Jose Nelson, Araujo, Guido, Colmenares, Giancarlo, Wang, Kai-Ting Amy

    “…Image-to-column (Im2col) and column-to-image (Col2im) are data transformations extensively used to map convolution to matrix multiplication. These…”
    Get full text
    Conference Proceeding
  7. 7

    Improving Transactional Code Generation via Variable Annotation and Barrier Elision by de Carvalho, Joao P. L., Honorio, Bruno C., Baldassin, Alexandro, Araujo, Guido

    “…With chip manufacturers such as Intel, IBM and ARM offering native support for transactional memory in their instruction set architectures, memory transactions…”
    Get full text
    Conference Proceeding
  8. 8

    On the impact of mode transition on phased transactional memory performance by Munoz Morales, Catalina, Honorio, Bruno, de Carvalho, Joao P.L., Baldassin, Alexandro, Araujo, Guido

    “…Several transactional memory implementations that employ state-of-the-art software and hardware techniques to deliver performance have been investigated in the…”
    Get full text
    Journal Article
  9. 9

    An efficient parallel implementation for training supervised optimum-path forest classifiers by Culquicondor, Aldo, Baldassin, Alexandro, Castelo-Fernández, Cesar, de Carvalho, João P.L., Papa, João Paulo

    Published in Neurocomputing (Amsterdam) (14-06-2020)
    “…In this work, we propose and analyze parallel training algorithms for the Optimum-Path Forest (OPF) classifier. We start with a naïve parallelization approach…”
    Get full text
    Journal Article
  10. 10

    Fast Matrix Multiplication via Compiler-only Layered Data Reorganization and Intrinsic Lowering by Kuzma, Braedy, Korostelev, Ivan, de Carvalho, João P. L, Moreira, José E, Barton, Christopher, Araujo, Guido, Amaral, José Nelson

    Published 15-05-2023
    “…The resurgence of machine learning has increased the demand for high-performance basic linear algebra subroutines (BLAS), which have long depended on libraries…”
    Get full text
    Journal Article
  11. 11

    DOACROSS Parallelization Based on Component Annotation and Loop-Carried Probability by Mattos, Luis, Cesar, Divino, Salamanca, Juan, de Carvalho, Joao P. L., Pereira, Marcio, Araujo, Guido

    “…Although modern compilers implement many loop parallelization techniques, their application is typically restricted to loops that have no loop-carried…”
    Get full text
    Conference Proceeding
  12. 12

    On the Efficiency of Transactional Code Generation: A GCC Case Study by Honorio, Bruno Chinelato, Labegalini de Carvalho, Joao Paulo, Baldassin, Alexandro Jose

    “…Memory transactions are becoming more popular as chip manufacturers are building native support for their execution. Although current Intel and IBM…”
    Get full text
    Conference Proceeding
  13. 13

    Advancing Direct Convolution using Convolution Slicing Optimization and ISA Extensions by Ferrari, Victor, Sousa, Rafael, Pereira, Marcio, de Carvalho, João P. L, Amaral, José Nelson, Moreira, José, Araujo, Guido

    Published 08-03-2023
    “…Convolution is one of the most computationally intensive operations that must be performed for machine-learning model inference. A traditional approach to…”
    Get full text
    Journal Article