Distilled One-Shot Federated Learning
Current federated learning algorithms take tens of communication rounds transmitting unwieldy model weights under ideal circumstances and hundreds when data is poorly distributed. Inspired by recent work on dataset distillation and distributed one-shot learning, we propose Distilled One-Shot Federat...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
16-09-2020
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Current federated learning algorithms take tens of communication rounds
transmitting unwieldy model weights under ideal circumstances and hundreds when
data is poorly distributed. Inspired by recent work on dataset distillation and
distributed one-shot learning, we propose Distilled One-Shot Federated Learning
(DOSFL) to significantly reduce the communication cost while achieving
comparable performance. In just one round, each client distills their private
dataset, sends the synthetic data (e.g. images or sentences) to the server, and
collectively trains a global model. The distilled data look like noise and are
only useful to the specific model weights, i.e., become useless after the model
updates. With this weight-less and gradient-less design, the total
communication cost of DOSFL is up to three orders of magnitude less than FedAvg
while preserving between 93% to 99% performance of a centralized counterpart.
Afterwards, clients could switch to traditional methods such as FedAvg to
finetune the last few percent to fit personalized local models with local
datasets. Through comprehensive experiments, we show the accuracy and
communication performance of DOSFL on both vision and language tasks with
different models including CNN, LSTM, Transformer, etc. We demonstrate that an
eavesdropping attacker cannot properly train a good model using the leaked
distilled data, without knowing the initial model weights. DOSFL serves as an
inexpensive method to quickly converge on a performant pre-trained model with
less than 0.1% communication cost of traditional methods. |
---|---|
DOI: | 10.48550/arxiv.2009.07999 |