KrADagrad: Kronecker Approximation-Domination Gradient Preconditioned Stochastic Optimization

Second order stochastic optimizers allow parameter update step size and direction to adapt to loss curvature, but have traditionally required too much memory and compute for deep learning. Recently, Shampoo [Gupta et al., 2018] introduced a Kronecker factored preconditioner to reduce these requireme...

Full description

Saved in:
Bibliographic Details
Main Authors: Mei, Jonathan, Moreno, Alexander, Walters, Luke
Format: Journal Article
Language:English
Published: 30-05-2023
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Be the first to leave a comment!
You must be logged in first