swCaffe: A Parallel Framework for Accelerating Deep Learning Applications on Sunway TaihuLight

This paper reports our efforts on swCaffe, a high-efficient parallel framework for accelerating deep neural networks (DNNs) training on Sunway TaihuLight, one of the fastest supercomputers in the world that adopts a unique heterogeneous many-core architecture. First, we point out some insightful pri...

Full description

Saved in:

Bibliographic Details
Published in:	2018 IEEE International Conference on Cluster Computing (CLUSTER) pp. 413 - 422
Main Authors:	Li, Liandeng, Fang, Jiarui, Fu, Haohuan, Jiang, Jinlei, Zhao, Wenlai, He, Conghui, You, Xin, Yang, Guangwen
Format:	Conference Proceeding
Language:	English
Published:	IEEE 01-09-2018
Subjects:	Bandwidth Neural networks Optimization Program processors Registers Supercomputers Training
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This paper reports our efforts on swCaffe, a high-efficient parallel framework for accelerating deep neural networks (DNNs) training on Sunway TaihuLight, one of the fastest supercomputers in the world that adopts a unique heterogeneous many-core architecture. First, we point out some insightful principles to fully exploit the performance of the innovative many-core architecture. Second, we propose a set of optimization strategies for redesigning a variety of neural network layers based on Caffe. Third, we put forward a topology-aware parameter synchronization scheme to scale the synchronous Stochastic Gradient Descent (SGD) method to multiple processors efficiently. We evaluate our framework by training a variety of widely used neural networks with the ImageNet dataset. On a single node, swCaffe can achieve 23%˜119% overall performance compared with Caffe running on K40m GPU. As compared with Caffe on CPU, swCaffe runs 3.04˜7.84× faster on all networks. When training ResNet50 and AlexNet with 1024 nodes, swCaffe can achieve up to 715.45× and 928.15× speedup.
ISSN:	2168-9253
DOI:	10.1109/CLUSTER.2018.00087