Communication-efficient federated learning with stagewise training strategy

The efficiency of communication across workers is a significant factor that affects the performance of federated learning. Though periodic communication strategy is applied to reduce communication rounds in training, the communication cost is still high when the training data distributions are not i...

Full description

Saved in:

Bibliographic Details
Published in:	Neural networks Vol. 167; pp. 460 - 472
Main Authors:	Cheng, Yifei, Shen, Shuheng, Liang, Xianfeng, Liu, Jingchang, Chen, Joya, Zhang, Tie, Chen, Enhong
Format:	Journal Article
Language:	English
Published:	Elsevier Ltd 01-10-2023
Subjects:	Communication complexity Convergence rate Federated learning Optimization algorithm Convergence rate Optimization algorithm Communication complexity Federated learning
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The efficiency of communication across workers is a significant factor that affects the performance of federated learning. Though periodic communication strategy is applied to reduce communication rounds in training, the communication cost is still high when the training data distributions are not independently and identically distributed (non-IID) which is common in federated learning. Recently, some works introduce variance reduction to eliminate the effect caused by non-IID data among workers. Nevertheless the provable optimal communication complexity O(log(ST)) and convergence rate O(1/(ST)) cannot be achieved simultaneously, where S denotes the number of sampled workers in each round and T is the number of iterations. To deal with this dilemma, we propose an optimization algorithm SQUARFA that adopts stagewise training framework coupling with variance reduction and uses a quick-start phase in each loop. Theoretical results show that SQUARFA achieves both optimal convergence rate and communication complexity for both strongly convex objectives and non-convex objectives under PL condition, thus fills the gap mentioned above. Then, a variant of SQUARFA yields the optimal theoretical results for general non-convex objectives. We further extend the technique in SQUARFA to the large batch setting and achieve optimal communication complexity. Experimental results demonstrate the superiority of the proposed algorithms.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0893-6080 1879-2782
DOI:	10.1016/j.neunet.2023.08.033