Search Results - "Cintra, Marcelo"
-
1
ATOM: Atomic Durability in Non-volatile Memory through Hardware Logging
Published in 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA) (01-02-2017)“…Non-volatile memory (NVM) is emerging as a fast byte-addressable alternative for storing persistent data. Ensuring atomic durability in NVM requires logging…”
Get full text
Conference Proceeding -
2
DHTM: Durable Hardware Transactional Memory
Published in 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA) (01-06-2018)“…The emergence of byte-addressable persistent (non-volatile) memory provides a low latency and high bandwidth path to durability. However, programmers need…”
Get full text
Conference Proceeding -
3
Efficient persist barriers for multicores
Published in 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (01-12-2015)“…Emerging non-volatile memory technologies enable fast, fine-grained persistence compared to slow block-based devices. In order to ensure consistency of…”
Get full text
Conference Proceeding -
4
Generating code for holistic query evaluation
Published in 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010) (01-03-2010)“…We present the application of customized code generation to database query evaluation. The idea is to use a collection of highly efficient code templates and…”
Get full text
Conference Proceeding -
5
An Evaluation of an OS-Based Coherence Scheme for Tiled CMPs
Published in International journal of parallel programming (01-06-2011)“…The interconnect mechanisms (shared bus or crossbar) used in current chip-multiprocessors (CMPs) are expected to become a bottleneck that prevents these…”
Get full text
Journal Article -
6
Adaptive Selection of Cache Indexing Bits for Removing Conflict Misses
Published in IEEE transactions on computers (01-06-2015)“…The design of cache memories is a crucial part of the design cycle of a modern processor, since they are able to bridge the performance gap between the…”
Get full text
Journal Article -
7
Software-Based Cache Coherence with Hardware-Assisted Selective Self-Invalidations Using Bloom Filters
Published in IEEE transactions on computers (01-04-2011)“…Implementing shared memory consistency models on top of hardware caches gives rise to the well-known cache coherence problem. The standard solution involves…”
Get full text
Journal Article -
8
Just-in-Time Compilation Techniques for Hardware/Software Co-Designed Processors
Published 01-01-2015“…Recently, with the broad adoption of mobile devices, considerable research efforts have concentrated on innovative dynamic optimization techniques to improve…”
Get full text
Dissertation -
9
Handling branches in TLS systems with Multi-Path Execution
Published in HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture (01-01-2010)“…Thread-Level Speculation (TLS) has been proposed to facilitate the extraction of parallel threads from sequential applications. Most prior work on TLS has…”
Get full text
Conference Proceeding -
10
Complementing user-level coarse-grain parallelism with implicit speculative parallelism
Published in 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (01-12-2011)“…Multi-core and many-core systems are the norm in contemporary processor technology and are expected to remain so for the foreseeable future. Programs using…”
Get full text
Conference Proceeding -
11
An OS-based alternative to full hardware coherence on tiled CMPs
Published in 2008 IEEE 14th International Symposium on High Performance Computer Architecture (01-02-2008)“…The interconnect mechanisms (shared bus or crossbar) used in current chip-multiprocessors (CMPs) are expected to become a bottleneck that prevents these…”
Get full text
Conference Proceeding -
12
Automatic Skleton-Driven Memory Affinity for Transactional Worklist Applications
Published in International journal of parallel programming (2014)“…Memory affinity has become a key element to achieve scalable performance on multi-core platforms. Mechanisms such as thread scheduling, page allocation and…”
Get full text
Journal Article -
13
Phase-Based Application-Driven Hierarchical Power Management on the Single-chip Cloud Computer
Published in 2011 International Conference on Parallel Architectures and Compilation Techniques (01-10-2011)“…To improve energy efficiency processors allow for Dynamic Voltage and Frequency Scaling (DVFS), which enables changing their performance and power consumption…”
Get full text
Conference Proceeding -
14
Design space exploration of a software speculative parallelization scheme
Published in IEEE transactions on parallel and distributed systems (01-06-2005)“…With speculative parallelization, code sections that cannot be fully analyzed by the compiler are optimistically executed in parallel. Hardware schemes are…”
Get full text
Journal Article -
15
Autotuning Skeleton-Driven Optimizations for Transactional Worklist Applications
Published in IEEE transactions on parallel and distributed systems (01-12-2012)“…Skeleton or pattern-based programming allows parallel programs to be expressed as specialized instances of generic communication and computation patterns. In…”
Get full text
Journal Article -
16
Automatic Skeleton-Driven Memory Affinity for Transactional Worklist Applications
Published in International journal of parallel programming (01-04-2014)“…Memory affinity has become a key element to achieve scalable performance on multi-core platforms. Mechanisms such as thread scheduling, page allocation and…”
Get full text
Journal Article -
17
A machine learning-based approach for thread mapping on transactional memory applications
Published in 2011 18th International Conference on High Performance Computing (01-12-2011)“…Thread mapping has been extensively used as a technique to efficiently exploit memory hierarchy on modern chip-multiprocessors. It places threads on cores in…”
Get full text
Conference Proceeding -
18
Increasing the energy efficiency of TLS systems using intermediate checkpointing
Published in 2011 18th International Conference on High Performance Computing (01-12-2011)“…With the advent of Chip Multiprocessors (CMPs), improving performance relies on the programmers/compilers to expose thread level parallelism to the underlying…”
Get full text
Conference Proceeding -
19
CAeSaR: Unified cluster-assignment scheduling and communication reuse for clustered VLIW processors
Published in 2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES) (01-09-2013)“…Clustered architectures have been proposed as a solution to the scalability problem of wide ILP processors. VLIW architectures, being wide-issue by design,…”
Get full text
Conference Proceeding -
20
Distance-aware round-robin mapping for large NUCA caches
Published in 2009 International Conference on High Performance Computing (HiPC) (01-12-2009)“…In many-core architectures, memory blocks are commonly assigned to the banks of a NUCA cache by following a physical mapping. This mapping assigns blocks to…”
Get full text
Conference Proceeding