Search Results - "Roma, Nuno"
-
1
A Compute Cache System for Signal Processing Applications
Published in Journal of signal processing systems (01-10-2021)“…Nowadays, processing systems are constrained by the low efficiency of their memory subsystems. Although memories evolved into faster and more efficient devices…”
Get full text
Journal Article -
2
A Reconfigurable Posit Tensor Unit with Variable-Precision Arithmetic and Automatic Data Streaming
Published in Journal of signal processing systems (01-12-2021)“…The increased adoption of DNN applications drove the emergence of dedicated tensor computing units to accelerate multi-dimensional matrix multiplication…”
Get full text
Journal Article -
3
NDPmulator: Enabling Full-System Simulation for Near-Data Accelerators From Caches to DRAM
Published in IEEE access (2024)“…The accurate simulation and performance assessment of Near-Data Accelerators (NDAccs) is a complex challenge as it must consider the operation of the entire…”
Get full text
Journal Article -
4
GPU Static Modeling Using PTX and Deep Structured Learning
Published in IEEE access (2019)“…In the quest for exascale computing, energy-efficiency is a fundamental goal in high-performance computing systems, typically achieved via dynamic voltage and…”
Get full text
Journal Article -
5
Efficient Hybrid DCT-Domain Algorithm for Video Spatial Downscaling
Published in EURASIP journal on advances in signal processing (01-01-2007)“…A highly efficient video downscaling algorithm for any arbitraryinteger scaling factor performed in a hybrid pixel transformdomain is proposed. This algorithm…”
Get full text
Journal Article -
6
Efficient Hybrid DCT-Domain Algorithm for Video Spatial Downscaling
Published in EURASIP journal on advances in signal processing (10-09-2007)“…: A highly efficient video downscaling algorithm for any arbitrary integer scaling factor performed in a hybrid pixel transform domain is proposed. This…”
Get full text
Journal Article -
7
Compiler-Assisted Data Streaming for Regular Code Structures
Published in IEEE transactions on computers (01-03-2021)“…The performance of modern processors is often limited by execution stalls resulting from long memory access latencies. Compile-time optimizations, deep cache…”
Get full text
Journal Article -
8
Unified Posit/IEEE-754 Vector MAC Unit for Transprecision Computing
Published in IEEE transactions on circuits and systems. II, Express briefs (01-05-2022)“…Transprecision computing targets energy-efficiency with multiple floating-point modules with different precisions to suit application requirements…”
Get full text
Journal Article -
9
Modeling and Decoupling the GPU Power Consumption for Cross-Domain DVFS
Published in IEEE transactions on parallel and distributed systems (01-11-2019)“…Dynamic voltage and frequency scaling (DVFS) is a popular technique to improve the energy-efficiency of high-performance computing systems. It allows placing…”
Get full text
Journal Article -
10
A tutorial overview on the properties of the discrete cosine transform for encoded image and video processing
Published in Signal processing (01-11-2011)“…Discrete trigonometric transforms, such as the discrete cosine transform (DCT) and the discrete sine transform (DST), have been extensively used in signal…”
Get full text
Journal Article -
11
Decoupling GPGPU voltage-frequency scaling for deep-learning applications
Published in Journal of parallel and distributed computing (01-07-2022)“…•GPUs may be safely undervoltage, allowing for non-conventional DVFS configurations.•A benchmark suit characterizes GPU components regarding undervoltage…”
Get full text
Journal Article -
12
Flying tourist problem: Flight time and cost minimization in complex routes
Published in Expert systems with applications (15-09-2019)“…•The NP-hard Flying Tourist Problem, a model for multi-city flight requests.•An efficient solution of the problem, based on a meta-heuristic methodology.•High…”
Get full text
Journal Article -
13
Adaptive In-Cache Streaming for Efficient Data Management
Published in IEEE transactions on very large scale integration (VLSI) systems (01-07-2017)“…The design of adaptive architectures is frequently focused on the sole adaptation of the processing blocks, often neglecting the power/performance impact of…”
Get full text
Journal Article -
14
Compiling for Vector Extensions With Stream-Based Specialization
Published in IEEE MICRO (01-09-2022)“…To overcome the current performance wall, data streaming and data-flow computing paradigms have been gradually making their way into the general-purpose…”
Get full text
Journal Article -
15
GPGPU Power Modeling for Multi-domain Voltage-Frequency Scaling
Published in 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA) (01-02-2018)“…Dynamic Voltage and Frequency Scaling (DVFS) on Graphics Processing Units (GPUs) components is one of the most promising power management strategies, due to…”
Get full text
Conference Proceeding -
16
Special issue on real-time energy-aware circuits and systems for HEVC and for its 3D and SVC extensions
Published in Journal of real-time image processing (01-03-2017)“…Since its approval, in 2013, the high-efficiency video coding (HEVC) standard [1, 2] has established as the new state-of-the-art on video compression…”
Get full text
Journal Article -
17
DVFS-aware application classification to improve GPGPUs energy efficiency
Published in Parallel computing (01-04-2019)“…•Exploring the effects of core and memory DVFS on the execution of GPU applications.•GPU characterization scheme validated on multiple GPU devices.•Classes of…”
Get full text
Journal Article -
18
Positnn: Training Deep Neural Networks with Mixed Low-Precision Posit
Published in ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (06-06-2021)“…Low-precision formats have proven to be an efficient way to reduce not only the memory footprint but also the hardware resources and power consumption of deep…”
Get full text
Conference Proceeding -
19
Trading Performance, Power, and Area on Low-Precision Posit MAC Units for CNN Training
Published in 2023 IEEE 35th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) (17-10-2023)“…The recently proposed Posit number system has been regarded as a particularly well-suited floating-point format to optimize the throughput and efficiency of…”
Get full text
Conference Proceeding -
20
gem5-ndp: Near-Data Processing Architecture Simulation From Low Level Caches to DRAM
Published in 2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) (01-11-2022)“…Unlike standard accelerators, the performance of Near-Data Processing (NDP) devices highly depends on the operation of the surrounding system, namely, the…”
Get full text
Conference Proceeding