On Optimal Early Stopping: Over-informative versus Under-informative Parametrization
Early stopping is a simple and widely used method to prevent over-training neural networks. We develop theoretical results to reveal the relationship between the optimal early stopping time and model dimension as well as sample size of the dataset for certain linear models. Our results demonstrate t...
Saved in:
Main Authors: | , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
20-02-2022
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Early stopping is a simple and widely used method to prevent over-training
neural networks. We develop theoretical results to reveal the relationship
between the optimal early stopping time and model dimension as well as sample
size of the dataset for certain linear models. Our results demonstrate two very
different behaviors when the model dimension exceeds the number of features
versus the opposite scenario. While most previous works on linear models focus
on the latter setting, we observe that the dimension of the model often exceeds
the number of features arising from data in common deep learning tasks and
propose a model to study this setting. We demonstrate experimentally that our
theoretical results on optimal early stopping time corresponds to the training
process of deep neural networks. |
---|---|
DOI: | 10.48550/arxiv.2202.09885 |