SparseGPT: Massive language models can be accurately pruned in one-shot

Frantar E, Alistarh D-A. 2023. SparseGPT: Massive language models can be accurately pruned in one-shot. Proceedings of the 40th International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 202, 10323–10337.

Download (ext.)
Conference Paper | Published | English

Scopus indexed
Department
Series Title
PMLR
Abstract
We show for the first time that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. This is achieved via a new pruning method called SparseGPT, specifically designed to work efficiently and accurately on massive GPT-family models. We can execute SparseGPT on the largest available open-source models, OPT-175B and BLOOM-176B, in under 4.5 hours, and can reach 60% unstructured sparsity with negligible increase in perplexity: remarkably, more than 100 billion weights from these models can be ignored at inference time. SparseGPT generalizes to semi-structured (2:4 and 4:8) patterns, and is compatible with weight quantization approaches. The code is available at: https://github.com/IST-DASLab/sparsegpt.
Publishing Year
Date Published
2023-07-30
Proceedings Title
Proceedings of the 40th International Conference on Machine Learning
Publisher
ML Research Press
Acknowledgement
The authors gratefully acknowledge funding from the European Research Council (ERC) under the European Union’s Horizon 2020 programme (grant agreement No. 805223 ScaleML), as well as experimental support from Eldar Kurtic, and from the IST Austria IT department, in particular Stefano Elefante, Andrei Hornoiu, and Alois Schloegl.
Acknowledged SSUs
Volume
202
Page
10323-10337
Conference
ICML: International Conference on Machine Learning
Conference Location
Honolulu, Hawaii, HI, United States
Conference Date
2023-07-23 – 2023-07-29
eISSN
IST-REx-ID

Cite this

Frantar E, Alistarh D-A. SparseGPT: Massive language models can be accurately pruned in one-shot. In: Proceedings of the 40th International Conference on Machine Learning. Vol 202. ML Research Press; 2023:10323-10337.
Frantar, E., & Alistarh, D.-A. (2023). SparseGPT: Massive language models can be accurately pruned in one-shot. In Proceedings of the 40th International Conference on Machine Learning (Vol. 202, pp. 10323–10337). Honolulu, Hawaii, HI, United States: ML Research Press.
Frantar, Elias, and Dan-Adrian Alistarh. “SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot.” In Proceedings of the 40th International Conference on Machine Learning, 202:10323–37. ML Research Press, 2023.
E. Frantar and D.-A. Alistarh, “SparseGPT: Massive language models can be accurately pruned in one-shot,” in Proceedings of the 40th International Conference on Machine Learning, Honolulu, Hawaii, HI, United States, 2023, vol. 202, pp. 10323–10337.
Frantar E, Alistarh D-A. 2023. SparseGPT: Massive language models can be accurately pruned in one-shot. Proceedings of the 40th International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 202, 10323–10337.
Frantar, Elias, and Dan-Adrian Alistarh. “SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot.” Proceedings of the 40th International Conference on Machine Learning, vol. 202, ML Research Press, 2023, pp. 10323–37.
All files available under the following license(s):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]

Link(s) to Main File(s)
Access Level
OA Open Access

Export

Marked Publications

Open Data ISTA Research Explorer

Sources

arXiv 2301.00774

Search this title in

Google Scholar