Generalization Bounds for Label Noise Stochastic Gradient Descent
We develop generalization error bounds for stochastic gradient descent (SGD) with label noise in non-convex settings under uniform dissipativity and smoothness conditions. Under a suitable choice of semimetric, we establish a contraction in Wasserstein distance of the label noise stochastic gradient...
Saved in:
Main Authors: | , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
31-10-2023
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | We develop generalization error bounds for stochastic gradient descent (SGD)
with label noise in non-convex settings under uniform dissipativity and
smoothness conditions. Under a suitable choice of semimetric, we establish a
contraction in Wasserstein distance of the label noise stochastic gradient flow
that depends polynomially on the parameter dimension $d$. Using the framework
of algorithmic stability, we derive time-independent generalisation error
bounds for the discretized algorithm with a constant learning rate. The error
bound we achieve scales polynomially with $d$ and with the rate of $n^{-2/3}$,
where $n$ is the sample size. This rate is better than the best-known rate of
$n^{-1/2}$ established for stochastic gradient Langevin dynamics (SGLD) --
which employs parameter-independent Gaussian noise -- under similar conditions.
Our analysis offers quantitative insights into the effect of label noise. |
---|---|
DOI: | 10.48550/arxiv.2311.00274 |