Deep learning acceleration in 14nm CMOS compatible ReRAM array: device, material and algorithm co-optimization

We show for the first time in hardware that in contrast to conventional stochastic gradient descent (SGD), our modified SGD algorithm (TTv2) together with a co-optimized ReRAM material achieves respectable accuracy (98%) on reduced MNIST classification (0 & 1), approaching a floating point (FP)...

Full description

Saved in:

Bibliographic Details
Published in:	2022 International Electron Devices Meeting (IEDM) pp. 33.7.1 - 33.7.4
Main Authors:	Gong, N., Rasch, M.J., Seo, S.-C., Gasasira, A., Solomon, P., Bragaglia, V., Consiglio, S., Higuchi, H., Park, C., Brew, K., Jamison, P., Catano, C., Saraf, I., Athena, F.F., Silvestre, C., Liu, X., Khan, B., Jain, N., Mcdermott, S., Johnson, R., Estrada-Raygoza, I., Li, J., Gokmen, T., Li, N., Pujari, R., Carta, F., Miyazoe, H., Frank, M.M., Koty, D., Yang, Q., Clark, R., Tapily, K., Wajda, C., Mosden, A., Shearer, J., Metz, A., Teehan, S., Saulnier, N., Offrein, B. J., Tsunomura, T., Leusink, G., Narayanan, V., Ando, T.
Format:	Conference Proceeding
Language:	English
Published:	IEEE 03-12-2022
Subjects:	Deep learning Dynamic range Hardware Performance evaluation Programming Switches Training
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	We show for the first time in hardware that in contrast to conventional stochastic gradient descent (SGD), our modified SGD algorithm (TTv2) together with a co-optimized ReRAM material achieves respectable accuracy (98%) on reduced MNIST classification (0 & 1), approaching a floating point (FP) baseline. To extrapolate these insights towards larger DNN training workloads in simulations, we establish an analog switching test sequence and extract key device statistics from 6T1R ReRAM arrays (up to 2k devices) built on a 14nm CMOS baseline. With this, we find that for larger DNN workloads, device and algorithm co-optimization shows dramatic improvements in comparison to standard SGD and baseline ReRAM. The gap to the reference floating-point accuracy across various tested DNNs indicates that further material and algorithmic optimizations are still needed. This work shows a pathway for scalable in-memory deep learning training using ReRAM crossbar arrays.
ISSN:	2156-017X
DOI:	10.1109/IEDM45625.2022.10019569