Search Results - "Feliu, Josué"
-
1
Speculative inter-thread store-to-load forwarding in SMT architectures
Published in Journal of parallel and distributed computing (01-03-2023)“…Applications running on out-of-order cores have benefited for decades of store-to-load forwarding which accelerates communication of store values to loads of…”
Get full text
Journal Article -
2
DeepP: Deep Learning Multi-Program Prefetch Configuration for the IBM POWER 8
Published in IEEE transactions on computers (01-10-2022)“…Current multi-core processors implement sophisticated hardware prefetchers, that can be configured by application (PID), to improve the system performance…”
Get full text
Journal Article -
3
Perf&Fair: A Progress-Aware Scheduler to Enhance Performance and Fairness in SMT Multicores
Published in IEEE transactions on computers (01-05-2017)“…Nowadays, high performance multicore processors implement multithreading capabilities. The processes running concurrently on these processors are continuously…”
Get full text
Journal Article -
4
Cloud White: Detecting and Estimating QoS Degradation of Latency-Critical Workloads in the Public Cloud
Published in Future generation computer systems (01-01-2023)“…The increasing popularity of cloud computing has forced cloud providers to build economies of scale to meet the growing demand. Nowadays, data-centers include…”
Get full text
Journal Article -
5
Effect of Hyper-Threading in Latency-Critical Multithreaded Cloud Applications and Utilization Analysis of the Major System Resources
Published in Future generation computer systems (01-06-2022)“…Multithreaded latency-critical applications represent an important subset of workloads running on public cloud systems. Most of these systems deploy powerful…”
Get full text
Journal Article -
6
Designing lab sessions focusing on real processors for computer architecture courses: A practical perspective
Published in Journal of parallel and distributed computing (01-08-2018)“…Computer architecture courses typically include lab sessions to reinforce, from a practical perspective, concepts and architectural mechanisms studied in…”
Get full text
Journal Article -
7
Improving IBM POWER8 Performance Through Symbiotic Job Scheduling
Published in IEEE transactions on parallel and distributed systems (01-10-2017)“…Symbiotic job scheduling, i.e., scheduling applications that co-run well together on a core, can have a considerable impact on the performance of processors…”
Get full text
Journal Article -
8
Thread-to-Core Allocation in ARM Processors Building Synergistic Pairs
Published in 2023 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT) (21-10-2023)“…Simultaneous multithreading (SMT) processors can present significant throughput improvements over single-threaded (ST) processors thanks to sharing internal…”
Get full text
Conference Proceeding -
9
Cache-Hierarchy Contention-Aware Scheduling in CMPs
Published in IEEE transactions on parallel and distributed systems (01-03-2014)“…To improve chip multiprocessor (CMP) performance, recent research has focused on scheduling strategies to mitigate main memory bandwidth contention. Nowadays,…”
Get full text
Journal Article -
10
Rebasing Microarchitectural Research with Industry Traces
Published in 2023 IEEE International Symposium on Workload Characterization (IISWC) (01-10-2023)“…Microarchitecture research relies on performance models with various degrees of accuracy and speed. In the past few years, one such model, ChampSim, has…”
Get full text
Conference Proceeding -
11
Thread Isolation to Improve Symbiotic Scheduling on SMT Multicore Processors
Published in IEEE transactions on parallel and distributed systems (01-02-2020)“…Resource sharing is a critical issue in simultaneous multithreading (SMT) processors as threads running simultaneously on an SMT core compete for shared…”
Get full text
Journal Article -
12
VMT: Virtualized Multi-Threading for Accelerating Graph Workloads on Commodity Processors
Published in IEEE transactions on computers (01-06-2022)“…Modern-day graph workloads operate on huge graphs through pointer chasing which leads to high last-level cache (LLC) miss rates and limited memory-level…”
Get full text
Journal Article -
13
Bandwidth-Aware Dynamic Prefetch Configuration for IBM POWER8
Published in IEEE transactions on parallel and distributed systems (01-08-2020)“…Advanced hardware prefetch engines are being integrated in current high-performance processors. Prefetching can boost the performance of most applications,…”
Get full text
Journal Article -
14
SYNPA: SMT Performance Analysis and Allocation of Threads to Cores in ARM Processors
Published 19-10-2023“…Simultaneous multithreading processors improve throughput over single-threaded processors thanks to sharing internal core resources among instructions from…”
Get full text
Journal Article -
15
Bandwidth-Aware On-Line Scheduling in SMT Multicores
Published in IEEE transactions on computers (01-02-2016)“…The memory hierarchy plays a critical role on the performance of current chip multiprocessors. Main memory is shared by all the running processes, which can…”
Get full text
Journal Article -
16
Precise Runahead Execution
Published in 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA) (01-02-2020)“…Runahead execution improves processor performance by accurately prefetching long-latency memory accesses. When a long-latency load causes the instruction…”
Get full text
Conference Proceeding -
17
CELLO: Compiler-Assisted Efficient Load-Load Ordering in Data-Race-Free Regions
Published in 2023 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT) (21-10-2023)“…Efficient Total Store Order (TSO) implementations allow loads to execute speculatively out-of-order. To detect order violations, the load queue (LQ) holds all…”
Get full text
Conference Proceeding -
18
Precise Runahead Execution
Published in IEEE computer architecture letters (01-01-2019)“…Runahead execution improves processor performance by accurately prefetching long-latency memory accesses. When a long-latency load causes the instruction…”
Get full text
Journal Article -
19
Understanding Cache Hierarchy Contention in CMPs to Improve Job Scheduling
Published in 2012 IEEE 26th International Parallel and Distributed Processing Symposium (01-05-2012)“…In order to improve CMP performance, recent research has focused on scheduling to mitigate contention produced by the limited memory bandwidth. Nowadays,…”
Get full text
Conference Proceeding -
20
Understanding Cloud Workloads Performance in a Production like Environment
Published 10-10-2020“…Understanding inter-VM interference is of paramount importance to provide a sound knowledge and understand where performance degradation comes from in the…”
Get full text
Journal Article