---
_id: '15011'
abstract:
- lang: eng
  text: Pruning large language models (LLMs) from the BERT family has emerged as a
    standard compression benchmark, and several pruning methods have been proposed
    for this task. The recent “Sparsity May Cry” (SMC) benchmark put into question
    the validity of all existing methods, exhibiting a more complex setup where many
    known pruning methods appear to fail. We revisit the question of accurate BERT-pruning
    during fine-tuning on downstream datasets, and propose a set of general guidelines
    for successful pruning, even on the challenging SMC benchmark. First, we perform
    a cost-vs-benefits analysis of pruning model components, such as the embeddings
    and the classification head; second, we provide a simple-yet-general way of scaling
    training, sparsification and learning rate schedules relative to the desired target
    sparsity; finally, we investigate the importance of proper parametrization for
    Knowledge Distillation in the context of LLMs. Our simple insights lead to state-of-the-art
    results, both on classic BERT-pruning benchmarks, as well as on the SMC benchmark,
    showing that even classic gradual magnitude pruning (GMP) can yield competitive
    results, with the right approach.
alternative_title:
- PMLR
article_processing_charge: No
arxiv: 1
author:
- first_name: Eldar
  full_name: Kurtic, Eldar
  id: 47beb3a5-07b5-11eb-9b87-b108ec578218
  last_name: Kurtic
- first_name: Torsten
  full_name: Hoefler, Torsten
  last_name: Hoefler
- first_name: Dan-Adrian
  full_name: Alistarh, Dan-Adrian
  id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
  last_name: Alistarh
  orcid: 0000-0003-3650-940X
citation:
  ama: 'Kurtic E, Hoefler T, Alistarh D-A. How to prune your language model: Recovering
    accuracy on the “Sparsity May Cry” benchmark. In: <i>Proceedings of Machine Learning
    Research</i>. Vol 234. ML Research Press; 2024:542-553.'
  apa: 'Kurtic, E., Hoefler, T., &#38; Alistarh, D.-A. (2024). How to prune your language
    model: Recovering accuracy on the “Sparsity May Cry” benchmark. In <i>Proceedings
    of Machine Learning Research</i> (Vol. 234, pp. 542–553). Hongkong, China: ML
    Research Press.'
  chicago: 'Kurtic, Eldar, Torsten Hoefler, and Dan-Adrian Alistarh. “How to Prune
    Your Language Model: Recovering Accuracy on the ‘Sparsity May Cry’ Benchmark.”
    In <i>Proceedings of Machine Learning Research</i>, 234:542–53. ML Research Press,
    2024.'
  ieee: 'E. Kurtic, T. Hoefler, and D.-A. Alistarh, “How to prune your language model:
    Recovering accuracy on the ‘Sparsity May Cry’ benchmark,” in <i>Proceedings of
    Machine Learning Research</i>, Hongkong, China, 2024, vol. 234, pp. 542–553.'
  ista: 'Kurtic E, Hoefler T, Alistarh D-A. 2024. How to prune your language model:
    Recovering accuracy on the ‘Sparsity May Cry’ benchmark. Proceedings of Machine
    Learning Research. CPAL: Conference on Parsimony and Learning, PMLR, vol. 234,
    542–553.'
  mla: 'Kurtic, Eldar, et al. “How to Prune Your Language Model: Recovering Accuracy
    on the ‘Sparsity May Cry’ Benchmark.” <i>Proceedings of Machine Learning Research</i>,
    vol. 234, ML Research Press, 2024, pp. 542–53.'
  short: E. Kurtic, T. Hoefler, D.-A. Alistarh, in:, Proceedings of Machine Learning
    Research, ML Research Press, 2024, pp. 542–553.
conference:
  end_date: 2024-01-06
  location: Hongkong, China
  name: 'CPAL: Conference on Parsimony and Learning'
  start_date: 2024-01-03
date_created: 2024-02-18T23:01:03Z
date_published: 2024-01-08T00:00:00Z
date_updated: 2024-02-26T10:30:52Z
day: '08'
department:
- _id: DaAl
external_id:
  arxiv:
  - '2312.13547'
intvolume: '       234'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://proceedings.mlr.press/v234/kurtic24a
month: '01'
oa: 1
oa_version: Preprint
page: 542-553
publication: Proceedings of Machine Learning Research
publication_identifier:
  eissn:
  - 2640-3498
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
scopus_import: '1'
status: public
title: 'How to prune your language model: Recovering accuracy on the "Sparsity May
  Cry" benchmark'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 234
year: '2024'
...
---
_id: '14460'
abstract:
- lang: eng
  text: We provide an efficient implementation of the backpropagation algorithm, specialized
    to the case where the weights of the neural network being trained are sparse.
    Our algorithm is general, as it applies to arbitrary (unstructured) sparsity and
    common layer types (e.g., convolutional or linear). We provide a fast vectorized
    implementation on commodity CPUs, and show that it can yield speedups in end-to-end
    runtime experiments, both in transfer learning using already-sparsified networks,
    and in training sparse networks from scratch. Thus, our results provide the first
    support for sparse training on commodity hardware.
acknowledgement: 'We would like to thank Elias Frantar for his valuable assistance
  and support at the outset of this project, and the anonymous ICML and SNN reviewers
  for very constructive feedback. EI was supported in part by the FWF DK VGSCO, grant
  agreement number W1260-N35. DA acknowledges generous ERC support, via Starting Grant
  805223 ScaleML. '
alternative_title:
- PMLR
article_processing_charge: No
arxiv: 1
author:
- first_name: Mahdi
  full_name: Nikdan, Mahdi
  id: 66374281-f394-11eb-9cf6-869147deecc0
  last_name: Nikdan
- first_name: Tommaso
  full_name: Pegolotti, Tommaso
  last_name: Pegolotti
- first_name: Eugenia B
  full_name: Iofinova, Eugenia B
  id: f9a17499-f6e0-11ea-865d-fdf9a3f77117
  last_name: Iofinova
  orcid: 0000-0002-7778-3221
- first_name: Eldar
  full_name: Kurtic, Eldar
  id: 47beb3a5-07b5-11eb-9b87-b108ec578218
  last_name: Kurtic
- first_name: Dan-Adrian
  full_name: Alistarh, Dan-Adrian
  id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
  last_name: Alistarh
  orcid: 0000-0003-3650-940X
citation:
  ama: 'Nikdan M, Pegolotti T, Iofinova EB, Kurtic E, Alistarh D-A. SparseProp: Efficient
    sparse backpropagation for faster training of neural networks at the edge. In:
    <i>Proceedings of the 40th International Conference on Machine Learning</i>. Vol
    202. ML Research Press; 2023:26215-26227.'
  apa: 'Nikdan, M., Pegolotti, T., Iofinova, E. B., Kurtic, E., &#38; Alistarh, D.-A.
    (2023). SparseProp: Efficient sparse backpropagation for faster training of neural
    networks at the edge. In <i>Proceedings of the 40th International Conference on
    Machine Learning</i> (Vol. 202, pp. 26215–26227). Honolulu, Hawaii, HI, United
    States: ML Research Press.'
  chicago: 'Nikdan, Mahdi, Tommaso Pegolotti, Eugenia B Iofinova, Eldar Kurtic, and
    Dan-Adrian Alistarh. “SparseProp: Efficient Sparse Backpropagation for Faster
    Training of Neural Networks at the Edge.” In <i>Proceedings of the 40th International
    Conference on Machine Learning</i>, 202:26215–27. ML Research Press, 2023.'
  ieee: 'M. Nikdan, T. Pegolotti, E. B. Iofinova, E. Kurtic, and D.-A. Alistarh, “SparseProp:
    Efficient sparse backpropagation for faster training of neural networks at the
    edge,” in <i>Proceedings of the 40th International Conference on Machine Learning</i>,
    Honolulu, Hawaii, HI, United States, 2023, vol. 202, pp. 26215–26227.'
  ista: 'Nikdan M, Pegolotti T, Iofinova EB, Kurtic E, Alistarh D-A. 2023. SparseProp:
    Efficient sparse backpropagation for faster training of neural networks at the
    edge. Proceedings of the 40th International Conference on Machine Learning. ICML:
    International Conference on Machine Learning, PMLR, vol. 202, 26215–26227.'
  mla: 'Nikdan, Mahdi, et al. “SparseProp: Efficient Sparse Backpropagation for Faster
    Training of Neural Networks at the Edge.” <i>Proceedings of the 40th International
    Conference on Machine Learning</i>, vol. 202, ML Research Press, 2023, pp. 26215–27.'
  short: M. Nikdan, T. Pegolotti, E.B. Iofinova, E. Kurtic, D.-A. Alistarh, in:, Proceedings
    of the 40th International Conference on Machine Learning, ML Research Press, 2023,
    pp. 26215–26227.
conference:
  end_date: 2023-07-29
  location: Honolulu, Hawaii, HI, United States
  name: 'ICML: International Conference on Machine Learning'
  start_date: 2023-07-23
date_created: 2023-10-29T23:01:17Z
date_published: 2023-07-30T00:00:00Z
date_updated: 2023-10-31T09:33:51Z
day: '30'
department:
- _id: DaAl
ec_funded: 1
external_id:
  arxiv:
  - '2302.04852'
intvolume: '       202'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2302.04852
month: '07'
oa: 1
oa_version: Preprint
page: 26215-26227
project:
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
  call_identifier: H2020
  grant_number: '805223'
  name: Elastic Coordination for Scalable Machine Learning
publication: Proceedings of the 40th International Conference on Machine Learning
publication_identifier:
  eissn:
  - 2640-3498
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
scopus_import: '1'
status: public
title: 'SparseProp: Efficient sparse backpropagation for faster training of neural
  networks at the edge'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 202
year: '2023'
...
---
_id: '13053'
abstract:
- lang: eng
  text: 'Deep neural networks (DNNs) often have to be compressed, via pruning and/or
    quantization, before they can be deployed in practical settings. In this work
    we propose a new compression-aware minimizer dubbed CrAM that modifies the optimization
    step in a principled way, in order to produce models whose local loss behavior
    is stable under compression operations such as pruning. Thus, dense models trained
    via CrAM should be compressible post-training, in a single step, without significant
    accuracy loss. Experimental results on standard benchmarks, such as residual networks
    for ImageNet classification and BERT models for language modelling, show that
    CrAM produces dense models that can be more accurate than the standard SGD/Adam-based
    baselines, but which are stable under weight pruning: specifically, we can prune
    models in one-shot to 70-80% sparsity with almost no accuracy loss, and to 90%
    with reasonable (∼1%) accuracy loss, which is competitive with gradual compression
    methods. Additionally, CrAM can produce sparse models which perform well for transfer
    learning, and it also works for semi-structured 2:4 pruning patterns supported
    by GPU hardware. The code for reproducing the results is available at this https
    URL .'
acknowledged_ssus:
- _id: ScienComp
acknowledgement: "AP, EK, DA received funding from the European Research Council (ERC)
  under the European\r\nUnion’s Horizon 2020 research and innovation programme (grant
  agreement No 805223 ScaleML). AV acknowledges the support of the French Agence Nationale
  de la Recherche (ANR), under grant ANR-21-CE48-0016 (project COMCOPT). We further
  acknowledge the support from the Scientific Service Units (SSU) of ISTA through
  resources provided by Scientific Computing (SciComp)-"
article_processing_charge: No
arxiv: 1
author:
- first_name: Elena-Alexandra
  full_name: Peste, Elena-Alexandra
  id: 32D78294-F248-11E8-B48F-1D18A9856A87
  last_name: Peste
- first_name: Adrian
  full_name: Vladu, Adrian
  last_name: Vladu
- first_name: Eldar
  full_name: Kurtic, Eldar
  id: 47beb3a5-07b5-11eb-9b87-b108ec578218
  last_name: Kurtic
- first_name: Christoph
  full_name: Lampert, Christoph
  id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
  last_name: Lampert
  orcid: 0000-0001-8622-7887
- first_name: Dan-Adrian
  full_name: Alistarh, Dan-Adrian
  id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
  last_name: Alistarh
  orcid: 0000-0003-3650-940X
citation:
  ama: 'Peste E-A, Vladu A, Kurtic E, Lampert C, Alistarh D-A. CrAM: A Compression-Aware
    Minimizer. In: <i>11th International Conference on Learning Representations </i>.'
  apa: 'Peste, E.-A., Vladu, A., Kurtic, E., Lampert, C., &#38; Alistarh, D.-A. (n.d.).
    CrAM: A Compression-Aware Minimizer. In <i>11th International Conference on Learning
    Representations </i>. Kigali, Rwanda .'
  chicago: 'Peste, Elena-Alexandra, Adrian Vladu, Eldar Kurtic, Christoph Lampert,
    and Dan-Adrian Alistarh. “CrAM: A Compression-Aware Minimizer.” In <i>11th International
    Conference on Learning Representations </i>, n.d.'
  ieee: 'E.-A. Peste, A. Vladu, E. Kurtic, C. Lampert, and D.-A. Alistarh, “CrAM:
    A Compression-Aware Minimizer,” in <i>11th International Conference on Learning
    Representations </i>, Kigali, Rwanda .'
  ista: 'Peste E-A, Vladu A, Kurtic E, Lampert C, Alistarh D-A. CrAM: A Compression-Aware
    Minimizer. 11th International Conference on Learning Representations . ICLR: International
    Conference on Learning Representations.'
  mla: 'Peste, Elena-Alexandra, et al. “CrAM: A Compression-Aware Minimizer.” <i>11th
    International Conference on Learning Representations </i>.'
  short: E.-A. Peste, A. Vladu, E. Kurtic, C. Lampert, D.-A. Alistarh, in:, 11th International
    Conference on Learning Representations , n.d.
conference:
  end_date: 2023-05-05
  location: 'Kigali, Rwanda '
  name: 'ICLR: International Conference on Learning Representations'
  start_date: 2023-05-01
date_created: 2023-05-23T11:36:18Z
date_published: 2023-05-01T00:00:00Z
date_updated: 2023-06-01T12:54:45Z
department:
- _id: GradSch
- _id: DaAl
- _id: ChLa
ec_funded: 1
external_id:
  arxiv:
  - '2207.14200'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://openreview.net/pdf?id=_eTZBs-yedr
month: '05'
oa: 1
oa_version: Preprint
project:
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
  call_identifier: H2020
  grant_number: '805223'
  name: Elastic Coordination for Scalable Machine Learning
publication: '11th International Conference on Learning Representations '
publication_status: accepted
quality_controlled: '1'
related_material:
  record:
  - id: '13074'
    relation: dissertation_contains
    status: public
status: public
title: 'CrAM: A Compression-Aware Minimizer'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2023'
...
---
_id: '11463'
abstract:
- lang: eng
  text: "Efficiently approximating local curvature information of the loss function
    is a key tool for optimization and compression of deep neural networks. Yet, most
    existing methods to approximate second-order information have high computational\r\nor
    storage costs, which limits their practicality. In this work, we investigate matrix-free,
    linear-time approaches for estimating Inverse-Hessian Vector Products (IHVPs)
    for the case when the Hessian can be approximated as a sum of rank-one matrices,
    as in the classic approximation of the Hessian by the empirical Fisher matrix.
    We propose two new algorithms: the first is tailored towards network compression
    and can compute the IHVP for dimension d, if the Hessian is given as a sum of
    m rank-one matrices, using O(dm2) precomputation, O(dm) cost for computing the
    IHVP, and query cost O(m) for any single element of the inverse Hessian. The second
    algorithm targets an optimization setting, where we wish to compute the product
    between the inverse Hessian, estimated over a sliding window of optimization steps,
    and a given gradient direction, as required for preconditioned SGD. We give an
    algorithm with cost O(dm + m2) for computing the IHVP and O(dm + m3) for adding
    or removing any gradient from the sliding window. These\r\ntwo algorithms yield
    state-of-the-art results for network pruning and optimization with lower computational
    overhead relative to existing second-order methods. Implementations are available
    at [9] and [17]."
acknowledgement: We gratefully acknowledge funding the European Research Council (ERC)
  under the European Union’s Horizon 2020 research and innovation programme (grant
  agreement No 805223 ScaleML), as well as computational support from Amazon Web Services
  (AWS) EC2.
article_processing_charge: No
arxiv: 1
author:
- first_name: Elias
  full_name: Frantar, Elias
  id: 09a8f98d-ec99-11ea-ae11-c063a7b7fe5f
  last_name: Frantar
- first_name: Eldar
  full_name: Kurtic, Eldar
  id: 47beb3a5-07b5-11eb-9b87-b108ec578218
  last_name: Kurtic
- first_name: Dan-Adrian
  full_name: Alistarh, Dan-Adrian
  id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
  last_name: Alistarh
  orcid: 0000-0003-3650-940X
citation:
  ama: 'Frantar E, Kurtic E, Alistarh D-A. M-FAC: Efficient matrix-free approximations
    of second-order information. In: <i>35th Conference on Neural Information Processing
    Systems</i>. Vol 34. Curran Associates; 2021:14873-14886.'
  apa: 'Frantar, E., Kurtic, E., &#38; Alistarh, D.-A. (2021). M-FAC: Efficient matrix-free
    approximations of second-order information. In <i>35th Conference on Neural Information
    Processing Systems</i> (Vol. 34, pp. 14873–14886). Virtual, Online: Curran Associates.'
  chicago: 'Frantar, Elias, Eldar Kurtic, and Dan-Adrian Alistarh. “M-FAC: Efficient
    Matrix-Free Approximations of Second-Order Information.” In <i>35th Conference
    on Neural Information Processing Systems</i>, 34:14873–86. Curran Associates,
    2021.'
  ieee: 'E. Frantar, E. Kurtic, and D.-A. Alistarh, “M-FAC: Efficient matrix-free
    approximations of second-order information,” in <i>35th Conference on Neural Information
    Processing Systems</i>, Virtual, Online, 2021, vol. 34, pp. 14873–14886.'
  ista: 'Frantar E, Kurtic E, Alistarh D-A. 2021. M-FAC: Efficient matrix-free approximations
    of second-order information. 35th Conference on Neural Information Processing
    Systems. NeurIPS: Neural Information Processing Systems vol. 34, 14873–14886.'
  mla: 'Frantar, Elias, et al. “M-FAC: Efficient Matrix-Free Approximations of Second-Order
    Information.” <i>35th Conference on Neural Information Processing Systems</i>,
    vol. 34, Curran Associates, 2021, pp. 14873–86.'
  short: E. Frantar, E. Kurtic, D.-A. Alistarh, in:, 35th Conference on Neural Information
    Processing Systems, Curran Associates, 2021, pp. 14873–14886.
conference:
  end_date: 2021-12-14
  location: Virtual, Online
  name: 'NeurIPS: Neural Information Processing Systems'
  start_date: 2021-12-06
date_created: 2022-06-26T22:01:35Z
date_published: 2021-12-06T00:00:00Z
date_updated: 2022-06-27T07:05:12Z
day: '06'
department:
- _id: DaAl
ec_funded: 1
external_id:
  arxiv:
  - '2010.08222'
intvolume: '        34'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://proceedings.neurips.cc/paper/2021/file/7cfd5df443b4eb0d69886a583b33de4c-Paper.pdf
month: '12'
oa: 1
oa_version: Published Version
page: 14873-14886
project:
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
  call_identifier: H2020
  grant_number: '805223'
  name: Elastic Coordination for Scalable Machine Learning
publication: 35th Conference on Neural Information Processing Systems
publication_identifier:
  isbn:
  - '9781713845393'
  issn:
  - 1049-5258
publication_status: published
publisher: Curran Associates
quality_controlled: '1'
scopus_import: '1'
status: public
title: 'M-FAC: Efficient matrix-free approximations of second-order information'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 34
year: '2021'
...