{"file_date_updated":"2021-05-25T09:51:36Z","ddc":["000"],"volume":119,"intvolume":" 119","year":"2020","conference":{"end_date":"2020-07-18","location":"Online","name":"ICML: International Conference on Machine Learning","start_date":"2020-07-12"},"citation":{"ama":"Kurtz M, Kopinsky J, Gelashvili R, et al. Inducing and exploiting activation sparsity for fast neural network inference. In: 37th International Conference on Machine Learning, ICML 2020. Vol 119. ; 2020:5533-5543.","apa":"Kurtz, M., Kopinsky, J., Gelashvili, R., Matveev, A., Carr, J., Goin, M., … Alistarh, D.-A. (2020). Inducing and exploiting activation sparsity for fast neural network inference. In 37th International Conference on Machine Learning, ICML 2020 (Vol. 119, pp. 5533–5543). Online.","ista":"Kurtz M, Kopinsky J, Gelashvili R, Matveev A, Carr J, Goin M, Leiserson W, Moore S, Nell B, Shavit N, Alistarh D-A. 2020. Inducing and exploiting activation sparsity for fast neural network inference. 37th International Conference on Machine Learning, ICML 2020. ICML: International Conference on Machine Learning vol. 119, 5533–5543.","chicago":"Kurtz, Mark, Justin Kopinsky, Rati Gelashvili, Alexander Matveev, John Carr, Michael Goin, William Leiserson, et al. “Inducing and Exploiting Activation Sparsity for Fast Neural Network Inference.” In 37th International Conference on Machine Learning, ICML 2020, 119:5533–43, 2020.","short":"M. Kurtz, J. Kopinsky, R. Gelashvili, A. Matveev, J. Carr, M. Goin, W. Leiserson, S. Moore, B. Nell, N. Shavit, D.-A. Alistarh, in:, 37th International Conference on Machine Learning, ICML 2020, 2020, pp. 5533–5543.","mla":"Kurtz, Mark, et al. “Inducing and Exploiting Activation Sparsity for Fast Neural Network Inference.” 37th International Conference on Machine Learning, ICML 2020, vol. 119, 2020, pp. 5533–43.","ieee":"M. Kurtz et al., “Inducing and exploiting activation sparsity for fast neural network inference,” in 37th International Conference on Machine Learning, ICML 2020, Online, 2020, vol. 119, pp. 5533–5543."},"language":[{"iso":"eng"}],"_id":"9415","page":"5533-5543","publication":"37th International Conference on Machine Learning, ICML 2020","author":[{"first_name":"Mark","last_name":"Kurtz","full_name":"Kurtz, Mark"},{"first_name":"Justin","last_name":"Kopinsky","full_name":"Kopinsky, Justin"},{"full_name":"Gelashvili, Rati","last_name":"Gelashvili","first_name":"Rati"},{"full_name":"Matveev, Alexander","last_name":"Matveev","first_name":"Alexander"},{"first_name":"John","last_name":"Carr","full_name":"Carr, John"},{"full_name":"Goin, Michael","first_name":"Michael","last_name":"Goin"},{"full_name":"Leiserson, William","last_name":"Leiserson","first_name":"William"},{"last_name":"Moore","first_name":"Sage","full_name":"Moore, Sage"},{"last_name":"Nell","first_name":"Bill","full_name":"Nell, Bill"},{"last_name":"Shavit","first_name":"Nir","full_name":"Shavit, Nir"},{"orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","full_name":"Alistarh, Dan-Adrian","first_name":"Dan-Adrian","last_name":"Alistarh"}],"has_accepted_license":"1","publication_identifier":{"issn":["2640-3498"]},"status":"public","quality_controlled":"1","title":"Inducing and exploiting activation sparsity for fast neural network inference","user_id":"3E5EF7F0-F248-11E8-B48F-1D18A9856A87","date_created":"2021-05-23T22:01:45Z","oa":1,"oa_version":"Published Version","scopus_import":"1","date_published":"2020-07-12T00:00:00Z","abstract":[{"lang":"eng","text":"Optimizing convolutional neural networks for fast inference has recently become an extremely active area of research. One of the go-to solutions in this context is weight pruning, which aims to reduce computational and memory footprint by removing large subsets of the connections in a neural network. Surprisingly, much less attention has been given to exploiting sparsity in the activation maps, which tend to be naturally sparse in many settings thanks to the structure of rectified linear (ReLU) activation functions. In this paper, we present an in-depth analysis of methods for maximizing the sparsity of the activations in a trained neural network, and show that, when coupled with an efficient sparse-input convolution algorithm, we can leverage this sparsity for significant performance gains. To induce highly sparse activation maps without accuracy loss, we introduce a new regularization technique, coupled with a new threshold-based sparsification method based on a parameterized activation function called Forced-Activation-Threshold Rectified Linear Unit (FATReLU). We examine the impact of our methods on popular image classification models, showing that most architectures can adapt to significantly sparser activation maps without any accuracy loss. Our second contribution is showing that these these compression gains can be translated into inference speedups: we provide a new algorithm to enable fast convolution operations over networks with sparse activations, and show that it can enable significant speedups for end-to-end inference on a range of popular models on the large-scale ImageNet image classification task on modern Intel CPUs, with little or no retraining cost. "}],"department":[{"_id":"DaAl"}],"day":"12","file":[{"date_updated":"2021-05-25T09:51:36Z","file_id":"9421","success":1,"date_created":"2021-05-25T09:51:36Z","file_size":741899,"relation":"main_file","checksum":"2aaaa7d7226e49161311d91627cf783b","content_type":"application/pdf","file_name":"2020_PMLR_Kurtz.pdf","creator":"kschuh","access_level":"open_access"}],"type":"conference","date_updated":"2023-02-23T13:57:24Z","month":"07","article_processing_charge":"No"}