Search Results - "Fukumoto, Naoto"
-
1
The 16,384-node Parallelism of 3D-CNN Training on An Arm CPU based Supercomputer
Published in 2021 IEEE 28th International Conference on High Performance Computing, Data, and Analytics (HiPC) (01-01-2021)“…As the computational cost and datasets available for deep neural network training continue to increase, there is a significant demand for fast distributed…”
Get full text
Conference Proceeding -
2
A traffic-aware memory-cube network using bypassing
Published in Microprocessors and microsystems (01-04-2022)“…Three-dimensional stack memory which provides both high-bandwidth access and large capacity is a promising technology for next-generation computer systems…”
Get full text
Journal Article -
3
Efficient Collision-Free MTTKRP Algorithm for Multi-core CPUs with Less Memory Usage
Published in 2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid) (01-05-2022)“…Tensor decomposition is often used to extract underlying features in the analysis of large and multi-dimensional data. For the tensor data with sparse…”
Get full text
Conference Proceeding -
4
Performance Analysis of Multi-Containerized MD Simulations for Low-Level Resource Allocation
Published in 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) (01-05-2022)“…This study discusses scheduling strategies to maximize ensemble throughput, which is the total throughput of multiple containers running simultaneously. Such a…”
Get full text
Conference Proceeding -
5
Towards Straggler-Tolerant and Accuracy-Aware Distributed DNN Training in Clouds
Published in 2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid) (01-05-2021)“…This study investigated how straggler mitigation affects accuracy during distributed training. While distributed training is one promising way to shorten…”
Get full text
Conference Proceeding -
6
Performance Analysis of Quantum Computer Simulators Across Different Environments
Published in 2024 IEEE/ACIS 22nd International Conference on Software Engineering Research, Management and Applications (SERA) (30-05-2024)“…Quantum computers can achieve extremely fast computations for certain problems using quantum properties. Due to these factors, research and development in the…”
Get full text
Conference Proceeding -
7
3D implemented SRAM/DRAM hybrid cache architecture for high-performance and low power consumption
Published in 2011 IEEE 54th International Midwest Symposium on Circuits and Systems (MWSCAS) (01-08-2011)“…This paper introduces our research status focusing on 3D-implemented microprocessors. 3D-IC is one of the most interesting techniques to achieve…”
Get full text
Conference Proceeding -
8
mpiQulacs: A Scalable Distributed Quantum Computer Simulator for ARM-based Clusters
Published in 2023 IEEE International Conference on Quantum Computing and Engineering (QCE) (17-09-2023)“…Quantum computer simulators running on classical computers are essential for developing real quantum computers and emerging quantum applications. In…”
Get full text
Conference Proceeding -
9
mpiQulacs: A Distributed Quantum Computer Simulator for A64FX-based Cluster Systems
Published 30-03-2022“…Quantum computer simulators running on classical computers are essential for developing real quantum computers and emerging quantum applications. In…”
Get full text
Journal Article -
10
MLPerf™ HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems
Published in 2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC) (01-11-2021)“…Scientific communities are increasingly adopting machine learning and deep learning models in their applications to accelerate scientific insights. High…”
Get full text
Conference Proceeding -
11
Low-Latency Low-Energy Memory-Cube Networks using Dual-Voltage Datapaths
Published in 2021 29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP) (01-03-2021)“…Three-dimensional stack memory that provides both high-bandwidth access and large capacity is a promising technology for next-generation computer systems…”
Get full text
Conference Proceeding -
12
Yet Another Accelerated SGD: ResNet-50 Training on ImageNet in 74.7 seconds
Published 29-03-2019“…There has been a strong demand for algorithms that can execute machine learning as faster as possible and the speed of deep learning has accelerated by 30…”
Get full text
Journal Article -
13
MLPerf HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems
Published 21-10-2021“…Scientific communities are increasingly adopting machine learning and deep learning models in their applications to accelerate scientific insights. High…”
Get full text
Journal Article -
14
Analyzing the impact of data prefetching on Chip MultiProcessors
Published in 2008 13th Asia-Pacific Computer Systems Architecture Conference (01-08-2008)“…Data prefetching is a well known approach to compensating for poor memory performance, and has been employed in commercial processor chips. Although a number…”
Get full text
Conference Proceeding