Improving Network Slimming With Nonconvex Regularization

Convolutional neural networks (CNNs) have developed to become powerful models for various computer vision tasks ranging from object detection to semantic segmentation. However, most of the state-of-the-art CNNs cannot be deployed directly on edge devices such as smartphones and drones, which need lo...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access Vol. 9; pp. 115292 - 115314
Main Authors: Bui, Kevin, Park, Fredrick, Zhang, Shuai, Qi, Yingyong, Xin, Jack
Format: Journal Article
Language:English
Published: Piscataway IEEE 2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Convolutional neural networks (CNNs) have developed to become powerful models for various computer vision tasks ranging from object detection to semantic segmentation. However, most of the state-of-the-art CNNs cannot be deployed directly on edge devices such as smartphones and drones, which need low latency under limited power and memory bandwidth. One popular, straightforward approach to compressing CNNs is network slimming, which imposes <inline-formula> <tex-math notation="LaTeX">\ell _{1} </tex-math></inline-formula> regularization on the channel-associated scaling factors via the batch normalization layers during training. Network slimming thereby identifies insignificant channels that can be pruned for inference. In this paper, we propose replacing the <inline-formula> <tex-math notation="LaTeX">\ell _{1} </tex-math></inline-formula> penalty with an alternative nonconvex, sparsity-inducing penalty in order to yield a more compressed and/or accurate CNN architecture. We investigate <inline-formula> <tex-math notation="LaTeX">\ell _{p} (0 < p < 1) </tex-math></inline-formula>, transformed <inline-formula> <tex-math notation="LaTeX">\ell _{1} </tex-math></inline-formula> (<inline-formula> <tex-math notation="LaTeX">\text{T}\ell _{1} </tex-math></inline-formula>), minimax concave penalty (MCP), and smoothly clipped absolute deviation (SCAD) due to their recent successes and popularity in solving sparse optimization problems, such as compressed sensing and variable selection. We demonstrate the effectiveness of network slimming with nonconvex penalties on three neural network architectures - VGG-19, DenseNet-40, and ResNet-164 - on standard image classification datasets. Based on the numerical experiments, <inline-formula> <tex-math notation="LaTeX">\text{T}\ell _{1} </tex-math></inline-formula> preserves model accuracy against channel pruning, <inline-formula> <tex-math notation="LaTeX">\ell _{1/2, 3/4} </tex-math></inline-formula> yield better compressed models with similar accuracies after retraining as <inline-formula> <tex-math notation="LaTeX">\ell _{1} </tex-math></inline-formula>, and MCP and SCAD provide more accurate models after retraining with similar compression as <inline-formula> <tex-math notation="LaTeX">\ell _{1} </tex-math></inline-formula>. Network slimming with <inline-formula> <tex-math notation="LaTeX">\text{T}\ell _{1} </tex-math></inline-formula> regularization also outperforms the latest Bayesian modification of network slimming in compressing a CNN architecture in terms of memory storage while preserving its model accuracy after channel pruning.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2021.3105366