Split Learning over Wireless Networks: Parallel Design and Resource Management

Split learning (SL) is a collaborative learning framework, which can train an artificial intelligence (AI) model between a device and an edge server by splitting the AI model into a device-side model and a server-side model at a cut layer. The existing SL approach conducts the training process seque...

Full description

Saved in:
Bibliographic Details
Published in:IEEE journal on selected areas in communications Vol. 41; no. 4; p. 1
Main Authors: Wu, Wen, Li, Mushu, Qu, Kaige, Zhou, Conghao, Shen, Xuemin, Zhuang, Weihua, Li, Xu, Shi, Weisen
Format: Journal Article
Language:English
Published: New York IEEE 01-04-2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Split learning (SL) is a collaborative learning framework, which can train an artificial intelligence (AI) model between a device and an edge server by splitting the AI model into a device-side model and a server-side model at a cut layer. The existing SL approach conducts the training process sequentially across devices, which incurs significant training latency especially when the number of devices is large. In this paper, we design a novel SL scheme to reduce the training latency, named Cluster-based Parallel SL (CPSL) which conducts model training in a "first-parallel-then-sequential" manner. Specifically, the CPSL is to partition devices into several clusters, parallelly train device-side models in each cluster and aggregate them, and then sequentially train the whole AI model across clusters, thereby parallelizing the training process and reducing training latency. Furthermore, we propose a resource management algorithm to minimize the training latency of CPSL considering device heterogeneity and network dynamics in wireless networks. This is achieved by stochastically optimizing the cut layer selection, device clustering, and radio spectrum allocation. The proposed two-timescale algorithm can jointly make the cut layer selection decision in a large timescale and device clustering and radio spectrum allocation decisions in a small timescale. Extensive simulation results on non-independent and identically distributed data demonstrate that the proposed solution can greatly reduce the training latency as compared with the existing SL benchmarks, while adapting to network dynamics.
ISSN:0733-8716
1558-0008
DOI:10.1109/JSAC.2023.3242704