---
_id: '15011'
abstract:
- lang: eng
  text: Pruning large language models (LLMs) from the BERT family has emerged as a
    standard compression benchmark, and several pruning methods have been proposed
    for this task. The recent “Sparsity May Cry” (SMC) benchmark put into question
    the validity of all existing methods, exhibiting a more complex setup where many
    known pruning methods appear to fail. We revisit the question of accurate BERT-pruning
    during fine-tuning on downstream datasets, and propose a set of general guidelines
    for successful pruning, even on the challenging SMC benchmark. First, we perform
    a cost-vs-benefits analysis of pruning model components, such as the embeddings
    and the classification head; second, we provide a simple-yet-general way of scaling
    training, sparsification and learning rate schedules relative to the desired target
    sparsity; finally, we investigate the importance of proper parametrization for
    Knowledge Distillation in the context of LLMs. Our simple insights lead to state-of-the-art
    results, both on classic BERT-pruning benchmarks, as well as on the SMC benchmark,
    showing that even classic gradual magnitude pruning (GMP) can yield competitive
    results, with the right approach.
alternative_title:
- PMLR
article_processing_charge: No
arxiv: 1
author:
- first_name: Eldar
  full_name: Kurtic, Eldar
  id: 47beb3a5-07b5-11eb-9b87-b108ec578218
  last_name: Kurtic
- first_name: Torsten
  full_name: Hoefler, Torsten
  last_name: Hoefler
- first_name: Dan-Adrian
  full_name: Alistarh, Dan-Adrian
  id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
  last_name: Alistarh
  orcid: 0000-0003-3650-940X
citation:
  ama: 'Kurtic E, Hoefler T, Alistarh D-A. How to prune your language model: Recovering
    accuracy on the “Sparsity May Cry” benchmark. In: <i>Proceedings of Machine Learning
    Research</i>. Vol 234. ML Research Press; 2024:542-553.'
  apa: 'Kurtic, E., Hoefler, T., &#38; Alistarh, D.-A. (2024). How to prune your language
    model: Recovering accuracy on the “Sparsity May Cry” benchmark. In <i>Proceedings
    of Machine Learning Research</i> (Vol. 234, pp. 542–553). Hongkong, China: ML
    Research Press.'
  chicago: 'Kurtic, Eldar, Torsten Hoefler, and Dan-Adrian Alistarh. “How to Prune
    Your Language Model: Recovering Accuracy on the ‘Sparsity May Cry’ Benchmark.”
    In <i>Proceedings of Machine Learning Research</i>, 234:542–53. ML Research Press,
    2024.'
  ieee: 'E. Kurtic, T. Hoefler, and D.-A. Alistarh, “How to prune your language model:
    Recovering accuracy on the ‘Sparsity May Cry’ benchmark,” in <i>Proceedings of
    Machine Learning Research</i>, Hongkong, China, 2024, vol. 234, pp. 542–553.'
  ista: 'Kurtic E, Hoefler T, Alistarh D-A. 2024. How to prune your language model:
    Recovering accuracy on the ‘Sparsity May Cry’ benchmark. Proceedings of Machine
    Learning Research. CPAL: Conference on Parsimony and Learning, PMLR, vol. 234,
    542–553.'
  mla: 'Kurtic, Eldar, et al. “How to Prune Your Language Model: Recovering Accuracy
    on the ‘Sparsity May Cry’ Benchmark.” <i>Proceedings of Machine Learning Research</i>,
    vol. 234, ML Research Press, 2024, pp. 542–53.'
  short: E. Kurtic, T. Hoefler, D.-A. Alistarh, in:, Proceedings of Machine Learning
    Research, ML Research Press, 2024, pp. 542–553.
conference:
  end_date: 2024-01-06
  location: Hongkong, China
  name: 'CPAL: Conference on Parsimony and Learning'
  start_date: 2024-01-03
date_created: 2024-02-18T23:01:03Z
date_published: 2024-01-08T00:00:00Z
date_updated: 2024-02-26T10:30:52Z
day: '08'
department:
- _id: DaAl
external_id:
  arxiv:
  - '2312.13547'
intvolume: '       234'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://proceedings.mlr.press/v234/kurtic24a
month: '01'
oa: 1
oa_version: Preprint
page: 542-553
publication: Proceedings of Machine Learning Research
publication_identifier:
  eissn:
  - 2640-3498
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
scopus_import: '1'
status: public
title: 'How to prune your language model: Recovering accuracy on the "Sparsity May
  Cry" benchmark'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 234
year: '2024'
...
---
_id: '14458'
abstract:
- lang: eng
  text: 'We show for the first time that large-scale generative pretrained transformer
    (GPT) family models can be pruned to at least 50% sparsity in one-shot, without
    any retraining, at minimal loss of accuracy. This is achieved via a new pruning
    method called SparseGPT, specifically designed to work efficiently and accurately
    on massive GPT-family models. We can execute SparseGPT on the largest available
    open-source models, OPT-175B and BLOOM-176B, in under 4.5 hours, and can reach
    60% unstructured sparsity with negligible increase in perplexity: remarkably,
    more than 100 billion weights from these models can be ignored at inference time.
    SparseGPT generalizes to semi-structured (2:4 and 4:8) patterns, and is compatible
    with weight quantization approaches. The code is available at: https://github.com/IST-DASLab/sparsegpt.'
acknowledged_ssus:
- _id: ScienComp
acknowledgement: The authors gratefully acknowledge funding from the European Research
  Council (ERC) under the European Union’s Horizon 2020 programme (grant agreement
  No. 805223 ScaleML), as well as experimental support from Eldar Kurtic, and from
  the IST Austria IT department, in particular Stefano Elefante, Andrei Hornoiu, and
  Alois Schloegl.
alternative_title:
- PMLR
article_processing_charge: No
arxiv: 1
author:
- first_name: Elias
  full_name: Frantar, Elias
  id: 09a8f98d-ec99-11ea-ae11-c063a7b7fe5f
  last_name: Frantar
- first_name: Dan-Adrian
  full_name: Alistarh, Dan-Adrian
  id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
  last_name: Alistarh
  orcid: 0000-0003-3650-940X
citation:
  ama: 'Frantar E, Alistarh D-A. SparseGPT: Massive language models can be accurately
    pruned in one-shot. In: <i>Proceedings of the 40th International Conference on
    Machine Learning</i>. Vol 202. ML Research Press; 2023:10323-10337.'
  apa: 'Frantar, E., &#38; Alistarh, D.-A. (2023). SparseGPT: Massive language models
    can be accurately pruned in one-shot. In <i>Proceedings of the 40th International
    Conference on Machine Learning</i> (Vol. 202, pp. 10323–10337). Honolulu, Hawaii,
    HI, United States: ML Research Press.'
  chicago: 'Frantar, Elias, and Dan-Adrian Alistarh. “SparseGPT: Massive Language
    Models Can Be Accurately Pruned in One-Shot.” In <i>Proceedings of the 40th International
    Conference on Machine Learning</i>, 202:10323–37. ML Research Press, 2023.'
  ieee: 'E. Frantar and D.-A. Alistarh, “SparseGPT: Massive language models can be
    accurately pruned in one-shot,” in <i>Proceedings of the 40th International Conference
    on Machine Learning</i>, Honolulu, Hawaii, HI, United States, 2023, vol. 202,
    pp. 10323–10337.'
  ista: 'Frantar E, Alistarh D-A. 2023. SparseGPT: Massive language models can be
    accurately pruned in one-shot. Proceedings of the 40th International Conference
    on Machine Learning. ICML: International Conference on Machine Learning, PMLR,
    vol. 202, 10323–10337.'
  mla: 'Frantar, Elias, and Dan-Adrian Alistarh. “SparseGPT: Massive Language Models
    Can Be Accurately Pruned in One-Shot.” <i>Proceedings of the 40th International
    Conference on Machine Learning</i>, vol. 202, ML Research Press, 2023, pp. 10323–37.'
  short: E. Frantar, D.-A. Alistarh, in:, Proceedings of the 40th International Conference
    on Machine Learning, ML Research Press, 2023, pp. 10323–10337.
conference:
  end_date: 2023-07-29
  location: Honolulu, Hawaii, HI, United States
  name: 'ICML: International Conference on Machine Learning'
  start_date: 2023-07-23
date_created: 2023-10-29T23:01:16Z
date_published: 2023-07-30T00:00:00Z
date_updated: 2023-10-31T09:59:42Z
day: '30'
department:
- _id: DaAl
ec_funded: 1
external_id:
  arxiv:
  - '2301.00774'
intvolume: '       202'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2301.00774
month: '07'
oa: 1
oa_version: Preprint
page: 10323-10337
project:
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
  call_identifier: H2020
  grant_number: '805223'
  name: Elastic Coordination for Scalable Machine Learning
publication: Proceedings of the 40th International Conference on Machine Learning
publication_identifier:
  eissn:
  - 2640-3498
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
scopus_import: '1'
status: public
title: 'SparseGPT: Massive language models can be accurately pruned in one-shot'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 202
year: '2023'
...
---
_id: '14459'
abstract:
- lang: eng
  text: Autoencoders are a popular model in many branches of machine learning and
    lossy data compression. However, their fundamental limits, the performance of
    gradient methods and the features learnt during optimization remain poorly understood,
    even in the two-layer setting. In fact, earlier work has considered either linear
    autoencoders or specific training regimes (leading to vanishing or diverging compression
    rates). Our paper addresses this gap by focusing on non-linear two-layer autoencoders
    trained in the challenging proportional regime in which the input dimension scales
    linearly with the size of the representation. Our results characterize the minimizers
    of the population risk, and show that such minimizers are achieved by gradient
    methods; their structure is also unveiled, thus leading to a concise description
    of the features obtained via training. For the special case of a sign activation
    function, our analysis establishes the fundamental limits for the lossy compression
    of Gaussian sources via (shallow) autoencoders. Finally, while the results are
    proved for Gaussian data, numerical simulations on standard datasets display the
    universality of the theoretical predictions.
acknowledgement: Aleksandr Shevchenko, Kevin Kogler and Marco Mondelli are supported
  by the 2019 Lopez-Loreta Prize. Hamed Hassani acknowledges the support by the NSF
  CIF award (1910056) and the NSF Institute for CORE Emerging Methods in Data Science
  (EnCORE).
alternative_title:
- PMLR
article_processing_charge: No
arxiv: 1
author:
- first_name: Aleksandr
  full_name: Shevchenko, Aleksandr
  id: F2B06EC2-C99E-11E9-89F0-752EE6697425
  last_name: Shevchenko
- first_name: Kevin
  full_name: Kögler, Kevin
  id: 94ec913c-dc85-11ea-9058-e5051ab2428b
  last_name: Kögler
- first_name: Hamed
  full_name: Hassani, Hamed
  last_name: Hassani
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
citation:
  ama: 'Shevchenko A, Kögler K, Hassani H, Mondelli M. Fundamental limits of two-layer
    autoencoders, and achieving them with gradient methods. In: <i>Proceedings of
    the 40th International Conference on Machine Learning</i>. Vol 202. ML Research
    Press; 2023:31151-31209.'
  apa: 'Shevchenko, A., Kögler, K., Hassani, H., &#38; Mondelli, M. (2023). Fundamental
    limits of two-layer autoencoders, and achieving them with gradient methods. In
    <i>Proceedings of the 40th International Conference on Machine Learning</i> (Vol.
    202, pp. 31151–31209). Honolulu, Hawaii, HI, United States: ML Research Press.'
  chicago: Shevchenko, Aleksandr, Kevin Kögler, Hamed Hassani, and Marco Mondelli.
    “Fundamental Limits of Two-Layer Autoencoders, and Achieving Them with Gradient
    Methods.” In <i>Proceedings of the 40th International Conference on Machine Learning</i>,
    202:31151–209. ML Research Press, 2023.
  ieee: A. Shevchenko, K. Kögler, H. Hassani, and M. Mondelli, “Fundamental limits
    of two-layer autoencoders, and achieving them with gradient methods,” in <i>Proceedings
    of the 40th International Conference on Machine Learning</i>, Honolulu, Hawaii,
    HI, United States, 2023, vol. 202, pp. 31151–31209.
  ista: 'Shevchenko A, Kögler K, Hassani H, Mondelli M. 2023. Fundamental limits of
    two-layer autoencoders, and achieving them with gradient methods. Proceedings
    of the 40th International Conference on Machine Learning. ICML: International
    Conference on Machine Learning, PMLR, vol. 202, 31151–31209.'
  mla: Shevchenko, Aleksandr, et al. “Fundamental Limits of Two-Layer Autoencoders,
    and Achieving Them with Gradient Methods.” <i>Proceedings of the 40th International
    Conference on Machine Learning</i>, vol. 202, ML Research Press, 2023, pp. 31151–209.
  short: A. Shevchenko, K. Kögler, H. Hassani, M. Mondelli, in:, Proceedings of the
    40th International Conference on Machine Learning, ML Research Press, 2023, pp.
    31151–31209.
conference:
  end_date: 2023-07-29
  location: Honolulu, Hawaii, HI, United States
  name: 'ICML: International Conference on Machine Learning'
  start_date: 2023-07-23
date_created: 2023-10-29T23:01:17Z
date_published: 2023-07-30T00:00:00Z
date_updated: 2024-09-10T13:03:19Z
day: '30'
department:
- _id: MaMo
- _id: DaAl
external_id:
  arxiv:
  - '2212.13468'
intvolume: '       202'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2212.13468
month: '07'
oa: 1
oa_version: Preprint
page: 31151-31209
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: Proceedings of the 40th International Conference on Machine Learning
publication_identifier:
  eissn:
  - 2640-3498
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
scopus_import: '1'
status: public
title: Fundamental limits of two-layer autoencoders, and achieving them with gradient
  methods
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 202
year: '2023'
...
---
_id: '14460'
abstract:
- lang: eng
  text: We provide an efficient implementation of the backpropagation algorithm, specialized
    to the case where the weights of the neural network being trained are sparse.
    Our algorithm is general, as it applies to arbitrary (unstructured) sparsity and
    common layer types (e.g., convolutional or linear). We provide a fast vectorized
    implementation on commodity CPUs, and show that it can yield speedups in end-to-end
    runtime experiments, both in transfer learning using already-sparsified networks,
    and in training sparse networks from scratch. Thus, our results provide the first
    support for sparse training on commodity hardware.
acknowledgement: 'We would like to thank Elias Frantar for his valuable assistance
  and support at the outset of this project, and the anonymous ICML and SNN reviewers
  for very constructive feedback. EI was supported in part by the FWF DK VGSCO, grant
  agreement number W1260-N35. DA acknowledges generous ERC support, via Starting Grant
  805223 ScaleML. '
alternative_title:
- PMLR
article_processing_charge: No
arxiv: 1
author:
- first_name: Mahdi
  full_name: Nikdan, Mahdi
  id: 66374281-f394-11eb-9cf6-869147deecc0
  last_name: Nikdan
- first_name: Tommaso
  full_name: Pegolotti, Tommaso
  last_name: Pegolotti
- first_name: Eugenia B
  full_name: Iofinova, Eugenia B
  id: f9a17499-f6e0-11ea-865d-fdf9a3f77117
  last_name: Iofinova
  orcid: 0000-0002-7778-3221
- first_name: Eldar
  full_name: Kurtic, Eldar
  id: 47beb3a5-07b5-11eb-9b87-b108ec578218
  last_name: Kurtic
- first_name: Dan-Adrian
  full_name: Alistarh, Dan-Adrian
  id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
  last_name: Alistarh
  orcid: 0000-0003-3650-940X
citation:
  ama: 'Nikdan M, Pegolotti T, Iofinova EB, Kurtic E, Alistarh D-A. SparseProp: Efficient
    sparse backpropagation for faster training of neural networks at the edge. In:
    <i>Proceedings of the 40th International Conference on Machine Learning</i>. Vol
    202. ML Research Press; 2023:26215-26227.'
  apa: 'Nikdan, M., Pegolotti, T., Iofinova, E. B., Kurtic, E., &#38; Alistarh, D.-A.
    (2023). SparseProp: Efficient sparse backpropagation for faster training of neural
    networks at the edge. In <i>Proceedings of the 40th International Conference on
    Machine Learning</i> (Vol. 202, pp. 26215–26227). Honolulu, Hawaii, HI, United
    States: ML Research Press.'
  chicago: 'Nikdan, Mahdi, Tommaso Pegolotti, Eugenia B Iofinova, Eldar Kurtic, and
    Dan-Adrian Alistarh. “SparseProp: Efficient Sparse Backpropagation for Faster
    Training of Neural Networks at the Edge.” In <i>Proceedings of the 40th International
    Conference on Machine Learning</i>, 202:26215–27. ML Research Press, 2023.'
  ieee: 'M. Nikdan, T. Pegolotti, E. B. Iofinova, E. Kurtic, and D.-A. Alistarh, “SparseProp:
    Efficient sparse backpropagation for faster training of neural networks at the
    edge,” in <i>Proceedings of the 40th International Conference on Machine Learning</i>,
    Honolulu, Hawaii, HI, United States, 2023, vol. 202, pp. 26215–26227.'
  ista: 'Nikdan M, Pegolotti T, Iofinova EB, Kurtic E, Alistarh D-A. 2023. SparseProp:
    Efficient sparse backpropagation for faster training of neural networks at the
    edge. Proceedings of the 40th International Conference on Machine Learning. ICML:
    International Conference on Machine Learning, PMLR, vol. 202, 26215–26227.'
  mla: 'Nikdan, Mahdi, et al. “SparseProp: Efficient Sparse Backpropagation for Faster
    Training of Neural Networks at the Edge.” <i>Proceedings of the 40th International
    Conference on Machine Learning</i>, vol. 202, ML Research Press, 2023, pp. 26215–27.'
  short: M. Nikdan, T. Pegolotti, E.B. Iofinova, E. Kurtic, D.-A. Alistarh, in:, Proceedings
    of the 40th International Conference on Machine Learning, ML Research Press, 2023,
    pp. 26215–26227.
conference:
  end_date: 2023-07-29
  location: Honolulu, Hawaii, HI, United States
  name: 'ICML: International Conference on Machine Learning'
  start_date: 2023-07-23
date_created: 2023-10-29T23:01:17Z
date_published: 2023-07-30T00:00:00Z
date_updated: 2023-10-31T09:33:51Z
day: '30'
department:
- _id: DaAl
ec_funded: 1
external_id:
  arxiv:
  - '2302.04852'
intvolume: '       202'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2302.04852
month: '07'
oa: 1
oa_version: Preprint
page: 26215-26227
project:
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
  call_identifier: H2020
  grant_number: '805223'
  name: Elastic Coordination for Scalable Machine Learning
publication: Proceedings of the 40th International Conference on Machine Learning
publication_identifier:
  eissn:
  - 2640-3498
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
scopus_import: '1'
status: public
title: 'SparseProp: Efficient sparse backpropagation for faster training of neural
  networks at the edge'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 202
year: '2023'
...
---
_id: '14461'
abstract:
- lang: eng
  text: 'Communication-reduction techniques are a popular way to improve scalability
    in data-parallel training of deep neural networks (DNNs). The recent emergence
    of large language models such as GPT has created the need for new approaches to
    exploit data-parallelism. Among these, fully-sharded data parallel (FSDP) training
    is highly popular, yet it still encounters scalability bottlenecks. One reason
    is that applying compression techniques to FSDP is challenging: as the vast majority
    of the communication involves the model’s weights, direct compression alters convergence
    and leads to accuracy loss. We present QSDP, a variant of FSDP which supports
    both gradient and weight quantization with theoretical guarantees, is simple to
    implement and has essentially no overheads. To derive QSDP we prove that a natural
    modification of SGD achieves convergence even when we only maintain quantized
    weights, and thus the domain over which we train consists of quantized points
    and is, therefore, highly non-convex. We validate this approach by training GPT-family
    models with up to 1.3 billion parameters on a multi-node cluster. Experiments
    show that QSDP preserves model accuracy, while completely removing the communication
    bottlenecks of FSDP, providing end-to-end speedups of up to 2.2x.'
acknowledged_ssus:
- _id: ScienComp
acknowledgement: The authors gratefully acknowledge funding from the European Research
  Council (ERC) under the European Union’s Horizon 2020 research and innovation programme
  (grant agreement No 805223 ScaleML), as well as experimental support from the IST
  Austria IT department, in particular Stefano Elefante, Andrei Hornoiu, and Alois
  Schloegl. AV acknowledges the support of the French Agence Nationale de la Recherche
  (ANR), under grant ANR-21-CE48-0016 (project COMCOPT), the support of Fondation
  Hadamard with a PRMO grant, and the support of CNRS with a CoopIntEER IEA grant
  (project ALFRED).
alternative_title:
- PMLR
article_processing_charge: No
arxiv: 1
author:
- first_name: Ilia
  full_name: Markov, Ilia
  id: D0CF4148-C985-11E9-8066-0BDEE5697425
  last_name: Markov
- first_name: Adrian
  full_name: Vladu, Adrian
  last_name: Vladu
- first_name: Qi
  full_name: Guo, Qi
  last_name: Guo
- first_name: Dan-Adrian
  full_name: Alistarh, Dan-Adrian
  id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
  last_name: Alistarh
  orcid: 0000-0003-3650-940X
citation:
  ama: 'Markov I, Vladu A, Guo Q, Alistarh D-A. Quantized distributed training of
    large models with convergence guarantees. In: <i>Proceedings of the 40th International
    Conference on Machine Learning</i>. Vol 202. ML Research Press; 2023:24020-24044.'
  apa: 'Markov, I., Vladu, A., Guo, Q., &#38; Alistarh, D.-A. (2023). Quantized distributed
    training of large models with convergence guarantees. In <i>Proceedings of the
    40th International Conference on Machine Learning</i> (Vol. 202, pp. 24020–24044).
    Honolulu, Hawaii, HI, United States: ML Research Press.'
  chicago: Markov, Ilia, Adrian Vladu, Qi Guo, and Dan-Adrian Alistarh. “Quantized
    Distributed Training of Large Models with Convergence Guarantees.” In <i>Proceedings
    of the 40th International Conference on Machine Learning</i>, 202:24020–44. ML
    Research Press, 2023.
  ieee: I. Markov, A. Vladu, Q. Guo, and D.-A. Alistarh, “Quantized distributed training
    of large models with convergence guarantees,” in <i>Proceedings of the 40th International
    Conference on Machine Learning</i>, Honolulu, Hawaii, HI, United States, 2023,
    vol. 202, pp. 24020–24044.
  ista: 'Markov I, Vladu A, Guo Q, Alistarh D-A. 2023. Quantized distributed training
    of large models with convergence guarantees. Proceedings of the 40th International
    Conference on Machine Learning. ICML: International Conference on Machine Learning,
    PMLR, vol. 202, 24020–24044.'
  mla: Markov, Ilia, et al. “Quantized Distributed Training of Large Models with Convergence
    Guarantees.” <i>Proceedings of the 40th International Conference on Machine Learning</i>,
    vol. 202, ML Research Press, 2023, pp. 24020–44.
  short: I. Markov, A. Vladu, Q. Guo, D.-A. Alistarh, in:, Proceedings of the 40th
    International Conference on Machine Learning, ML Research Press, 2023, pp. 24020–24044.
conference:
  end_date: 2023-07-29
  location: Honolulu, Hawaii, HI, United States
  name: 'ICML: International Conference on Machine Learning'
  start_date: 2023-07-23
date_created: 2023-10-29T23:01:17Z
date_published: 2023-07-30T00:00:00Z
date_updated: 2023-10-31T09:40:45Z
day: '30'
department:
- _id: DaAl
ec_funded: 1
external_id:
  arxiv:
  - '2302.02390'
intvolume: '       202'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2302.02390
month: '07'
oa: 1
oa_version: Preprint
page: 24020-24044
project:
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
  call_identifier: H2020
  grant_number: '805223'
  name: Elastic Coordination for Scalable Machine Learning
publication: Proceedings of the 40th International Conference on Machine Learning
publication_identifier:
  eissn:
  - 2640-3498
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
scopus_import: '1'
status: public
title: Quantized distributed training of large models with convergence guarantees
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 202
year: '2023'
...
---
_id: '14462'
abstract:
- lang: eng
  text: "We study fine-grained error bounds for differentially private algorithms
    for counting under continual observation. Our main insight is that the matrix
    mechanism when using lower-triangular matrices can be used in the continual observation
    model. More specifically, we give an explicit factorization for the counting matrix
    Mcount and upper bound the error explicitly. We also give a fine-grained analysis,
    specifying the exact constant in the upper bound. Our analysis is based on upper
    and lower bounds of the completely bounded norm (cb-norm) of Mcount\r\n. Along
    the way, we improve the best-known bound of 28 years by Mathias (SIAM Journal
    on Matrix Analysis and Applications, 1993) on the cb-norm of Mcount for a large
    range of the dimension of Mcount. Furthermore, we are the first to give concrete
    error bounds for various problems under continual observation such as binary counting,
    maintaining a histogram, releasing an approximately cut-preserving synthetic graph,
    many graph-based statistics, and substring and episode counting. Finally, we note
    that our result can be used to get a fine-grained error bound for non-interactive
    local learning and the first lower bounds on the additive error for (ϵ,δ)-differentially-private
    counting under continual observation. Subsequent to this work, Henzinger et al.
    (SODA, 2023) showed that our factorization also achieves fine-grained mean-squared
    error."
acknowledgement: "This project has received funding from the European Research Council
  (ERC) under the European Union’s Horizon 2020 research and innovation programme
  (Grant agreement No.\r\n101019564 “The Design of Modern Fully Dynamic Data Structures
  (MoDynStruct)” and from the Austrian Science Fund (FWF) project Z 422-N, and project
  “Fast Algorithms for a Reactive Network Layer (ReactNet)”, P 33775-N, with additional
  funding from the netidee SCIENCE Stiftung, 2020–2024. 2020–2024. JU’s research was
  funded by Decanal Research Grant. A part of this work was done when JU was visiting
  Indian Statistical Institute, Delhi. The authors would like to thank Rajat Bhatia,
  Aleksandar Nikolov, Shanta Laisharam, Vern Paulsen, Ryan Rogers, Abhradeep Thakurta,
  and Sarvagya Upadhyay for useful discussions."
alternative_title:
- PMLR
article_processing_charge: No
author:
- first_name: Hendrik
  full_name: Fichtenberger, Hendrik
  last_name: Fichtenberger
- first_name: Monika H
  full_name: Henzinger, Monika H
  id: 540c9bbd-f2de-11ec-812d-d04a5be85630
  last_name: Henzinger
  orcid: 0000-0002-5008-6530
- first_name: Jalaj
  full_name: Upadhyay, Jalaj
  last_name: Upadhyay
citation:
  ama: 'Fichtenberger H, Henzinger MH, Upadhyay J. Constant matters: Fine-grained
    error bound on differentially private continual observation. In: <i>Proceedings
    of the 40th International Conference on Machine Learning</i>. Vol 202. ML Research
    Press; 2023:10072-10092.'
  apa: 'Fichtenberger, H., Henzinger, M. H., &#38; Upadhyay, J. (2023). Constant matters:
    Fine-grained error bound on differentially private continual observation. In <i>Proceedings
    of the 40th International Conference on Machine Learning</i> (Vol. 202, pp. 10072–10092).
    Honolulu, Hawaii, HI, United States: ML Research Press.'
  chicago: 'Fichtenberger, Hendrik, Monika H Henzinger, and Jalaj Upadhyay. “Constant
    Matters: Fine-Grained Error Bound on Differentially Private Continual Observation.”
    In <i>Proceedings of the 40th International Conference on Machine Learning</i>,
    202:10072–92. ML Research Press, 2023.'
  ieee: 'H. Fichtenberger, M. H. Henzinger, and J. Upadhyay, “Constant matters: Fine-grained
    error bound on differentially private continual observation,” in <i>Proceedings
    of the 40th International Conference on Machine Learning</i>, Honolulu, Hawaii,
    HI, United States, 2023, vol. 202, pp. 10072–10092.'
  ista: 'Fichtenberger H, Henzinger MH, Upadhyay J. 2023. Constant matters: Fine-grained
    error bound on differentially private continual observation. Proceedings of the
    40th International Conference on Machine Learning. ICML: International Conference
    on Machine Learning, PMLR, vol. 202, 10072–10092.'
  mla: 'Fichtenberger, Hendrik, et al. “Constant Matters: Fine-Grained Error Bound
    on Differentially Private Continual Observation.” <i>Proceedings of the 40th International
    Conference on Machine Learning</i>, vol. 202, ML Research Press, 2023, pp. 10072–92.'
  short: H. Fichtenberger, M.H. Henzinger, J. Upadhyay, in:, Proceedings of the 40th
    International Conference on Machine Learning, ML Research Press, 2023, pp. 10072–10092.
conference:
  end_date: 2023-07-29
  location: Honolulu, Hawaii, HI, United States
  name: 'ICML: International Conference on Machine Learning'
  start_date: 2023-07-23
date_created: 2023-10-29T23:01:17Z
date_published: 2023-07-30T00:00:00Z
date_updated: 2025-07-15T12:51:52Z
day: '30'
department:
- _id: MoHe
ec_funded: 1
intvolume: '       202'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://proceedings.mlr.press/v202/fichtenberger23a/fichtenberger23a.pdf
month: '07'
oa: 1
oa_version: Published Version
page: 10072-10092
project:
- _id: bd9ca328-d553-11ed-ba76-dc4f890cfe62
  call_identifier: H2020
  grant_number: '101019564'
  name: The design and evaluation of modern fully dynamic data structures
- _id: 34def286-11ca-11ed-8bc3-da5948e1613c
  grant_number: Z00422
  name: Wittgenstein Award - Monika Henzinger
- _id: bd9e3a2e-d553-11ed-ba76-8aa684ce17fe
  grant_number: 'P33775 '
  name: Fast Algorithms for a Reactive Network Layer
publication: Proceedings of the 40th International Conference on Machine Learning
publication_identifier:
  eissn:
  - 2640-3498
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
scopus_import: '1'
status: public
title: 'Constant matters: Fine-grained error bound on differentially private continual
  observation'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 202
year: '2023'
...
---
_id: '13239'
abstract:
- lang: eng
  text: Brains are thought to engage in predictive learning - learning to predict
    upcoming stimuli - to construct an internal model of their environment. This is
    especially notable for spatial navigation, as first described by Tolman’s latent
    learning tasks. However, predictive learning has also been observed in sensory
    cortex, in settings unrelated to spatial navigation. Apart from normative frameworks
    such as active inference or efficient coding, what could be the utility of learning
    to predict the patterns of occurrence of correlated stimuli? Here we show that
    prediction, and thereby the construction of an internal model of sequential stimuli,
    can bootstrap the learning process of a working memory task in a recurrent neural
    network. We implemented predictive learning alongside working memory match-tasks,
    and networks emerged to solve the prediction task first by encoding information
    across time to predict upcoming stimuli, and then eavesdropped on this solution
    to solve the matching task. Eavesdropping was most beneficial when neural resources
    were limited. Hence, predictive learning acts as a general neural mechanism to
    learn to store sensory information that can later be essential for working memory
    tasks.
acknowledgement: "The authors would like to thank members of the Vogels lab and Manohar
  lab, as well as Adam Packer, Andrew Saxe, Stefano Sarao Mannelli and Jacob Bakermans
  for fruitful discussions and comments on earlier versions of the manuscript.\r\nTLvdP
  was supported by funding from the Biotechnology and Biological Sciences Research
  Council (BBSRC) [grant number BB/M011224/1]. TPV was supported by an ERC Consolidator
  Grant (SYNAPSEEK). SGM was funded by a MRC Clinician Scientist Fellowship MR/P00878X
  and Leverhulme Grant RPG-2018-310."
article_processing_charge: No
author:
- first_name: Thijs L.
  full_name: Van Der Plas, Thijs L.
  last_name: Van Der Plas
- first_name: Tim P
  full_name: Vogels, Tim P
  id: CB6FF8D2-008F-11EA-8E08-2637E6697425
  last_name: Vogels
  orcid: 0000-0003-3295-6181
- first_name: Sanjay G.
  full_name: Manohar, Sanjay G.
  last_name: Manohar
citation:
  ama: 'Van Der Plas TL, Vogels TP, Manohar SG. Predictive learning enables neural
    networks to learn complex working memory tasks. In: <i>Proceedings of Machine
    Learning Research</i>. Vol 199. ML Research Press; 2022:518-531.'
  apa: Van Der Plas, T. L., Vogels, T. P., &#38; Manohar, S. G. (2022). Predictive
    learning enables neural networks to learn complex working memory tasks. In <i>Proceedings
    of Machine Learning Research</i> (Vol. 199, pp. 518–531). ML Research Press.
  chicago: Van Der Plas, Thijs L., Tim P Vogels, and Sanjay G. Manohar. “Predictive
    Learning Enables Neural Networks to Learn Complex Working Memory Tasks.” In <i>Proceedings
    of Machine Learning Research</i>, 199:518–31. ML Research Press, 2022.
  ieee: T. L. Van Der Plas, T. P. Vogels, and S. G. Manohar, “Predictive learning
    enables neural networks to learn complex working memory tasks,” in <i>Proceedings
    of Machine Learning Research</i>, 2022, vol. 199, pp. 518–531.
  ista: Van Der Plas TL, Vogels TP, Manohar SG. 2022. Predictive learning enables
    neural networks to learn complex working memory tasks. Proceedings of Machine
    Learning Research. vol. 199, 518–531.
  mla: Van Der Plas, Thijs L., et al. “Predictive Learning Enables Neural Networks
    to Learn Complex Working Memory Tasks.” <i>Proceedings of Machine Learning Research</i>,
    vol. 199, ML Research Press, 2022, pp. 518–31.
  short: T.L. Van Der Plas, T.P. Vogels, S.G. Manohar, in:, Proceedings of Machine
    Learning Research, ML Research Press, 2022, pp. 518–531.
date_created: 2023-07-16T22:01:12Z
date_published: 2022-12-01T00:00:00Z
date_updated: 2023-07-18T06:36:28Z
day: '01'
ddc:
- '000'
department:
- _id: TiVo
ec_funded: 1
file:
- access_level: open_access
  checksum: 7530a93ef42e10b4db1e5e4b69796e93
  content_type: application/pdf
  creator: dernst
  date_created: 2023-07-18T06:32:38Z
  date_updated: 2023-07-18T06:32:38Z
  file_id: '13243'
  file_name: 2022_PMLR_vanderPlas.pdf
  file_size: 585135
  relation: main_file
  success: 1
file_date_updated: 2023-07-18T06:32:38Z
has_accepted_license: '1'
intvolume: '       199'
language:
- iso: eng
month: '12'
oa: 1
oa_version: Published Version
page: 518-531
project:
- _id: 0aacfa84-070f-11eb-9043-d7eb2c709234
  call_identifier: H2020
  grant_number: '819603'
  name: Learning the shape of synaptic plasticity rules for neuronal architectures
    and function through machine learning.
publication: Proceedings of Machine Learning Research
publication_identifier:
  eissn:
  - 2640-3498
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
scopus_import: '1'
status: public
title: Predictive learning enables neural networks to learn complex working memory
  tasks
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 199
year: '2022'
...
---
_id: '13241'
abstract:
- lang: eng
  text: Addressing fairness concerns about machine learning models is a crucial step
    towards their long-term adoption in real-world automated systems. Many approaches
    for training fair models from data have been developed and an implicit assumption
    about such algorithms is that they are able to recover a fair model, despite potential
    historical biases in the data. In this work we show a number of impossibility
    results that indicate that there is no learning algorithm that can recover a fair
    model when a proportion of the dataset is subject to arbitrary manipulations.
    Specifically, we prove that there are situations in which an adversary can force
    any learner to return a biased classifier, with or without degrading accuracy,
    and that the strength of this bias increases for learning problems with underrepresented
    protected groups in the data. Our results emphasize on the importance of studying
    further data corruption models of various strength and of establishing stricter
    data collection practices for fairness-aware learning.
acknowledgement: "This paper is a shortened, workshop version of Konstantinov and
  Lampert (2021),\r\nhttps://arxiv.org/abs/2102.06004. For further results, including
  an analysis of algorithms achieving the lower bounds from this paper, we refer to
  the full version."
article_processing_charge: No
arxiv: 1
author:
- first_name: Nikola H
  full_name: Konstantinov, Nikola H
  id: 4B9D76E4-F248-11E8-B48F-1D18A9856A87
  last_name: Konstantinov
- first_name: Christoph
  full_name: Lampert, Christoph
  id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
  last_name: Lampert
  orcid: 0000-0001-8622-7887
citation:
  ama: 'Konstantinov NH, Lampert C. On the impossibility of fairness-aware learning
    from corrupted data. In: <i>Proceedings of Machine Learning Research</i>. Vol
    171. ML Research Press; 2022:59-83.'
  apa: Konstantinov, N. H., &#38; Lampert, C. (2022). On the impossibility of fairness-aware
    learning from corrupted data. In <i>Proceedings of Machine Learning Research</i>
    (Vol. 171, pp. 59–83). ML Research Press.
  chicago: Konstantinov, Nikola H, and Christoph Lampert. “On the Impossibility of
    Fairness-Aware Learning from Corrupted Data.” In <i>Proceedings of Machine Learning
    Research</i>, 171:59–83. ML Research Press, 2022.
  ieee: N. H. Konstantinov and C. Lampert, “On the impossibility of fairness-aware
    learning from corrupted data,” in <i>Proceedings of Machine Learning Research</i>,
    2022, vol. 171, pp. 59–83.
  ista: Konstantinov NH, Lampert C. 2022. On the impossibility of fairness-aware learning
    from corrupted data. Proceedings of Machine Learning Research. vol. 171, 59–83.
  mla: Konstantinov, Nikola H., and Christoph Lampert. “On the Impossibility of Fairness-Aware
    Learning from Corrupted Data.” <i>Proceedings of Machine Learning Research</i>,
    vol. 171, ML Research Press, 2022, pp. 59–83.
  short: N.H. Konstantinov, C. Lampert, in:, Proceedings of Machine Learning Research,
    ML Research Press, 2022, pp. 59–83.
date_created: 2023-07-16T22:01:13Z
date_published: 2022-12-01T00:00:00Z
date_updated: 2023-09-26T10:44:37Z
day: '01'
department:
- _id: ChLa
external_id:
  arxiv:
  - '2102.06004'
intvolume: '       171'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://arxiv.org/abs/2102.06004
month: '12'
oa: 1
oa_version: Preprint
page: 59-83
publication: Proceedings of Machine Learning Research
publication_identifier:
  eissn:
  - 2640-3498
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
related_material:
  record:
  - id: '10802'
    relation: extended_version
    status: public
scopus_import: '1'
status: public
title: On the impossibility of fairness-aware learning from corrupted data
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 171
year: '2022'
...
---
_id: '13146'
abstract:
- lang: eng
  text: 'A recent line of work has analyzed the theoretical properties of deep neural
    networks via the Neural Tangent Kernel (NTK). In particular, the smallest eigenvalue
    of the NTK has been related to the memorization capacity, the global convergence
    of gradient descent algorithms and the generalization of deep nets. However, existing
    results either provide bounds in the two-layer setting or assume that the spectrum
    of the NTK matrices is bounded away from 0 for multi-layer networks. In this paper,
    we provide tight bounds on the smallest eigenvalue of NTK matrices for deep ReLU
    nets, both in the limiting case of infinite widths and for finite widths. In the
    finite-width setting, the network architectures we consider are fairly general:
    we require the existence of a wide layer with roughly order of N neurons, N being
    the number of data samples; and the scaling of the remaining layer widths is arbitrary
    (up to logarithmic factors). To obtain our results, we analyze various quantities
    of independent interest: we give lower bounds on the smallest singular value of
    hidden feature matrices, and upper bounds on the Lipschitz constant of input-output
    feature maps.'
acknowledgement: The authors would like to thank the anonymous reviewers for their
  helpful comments. MM was partially supported by the 2019 Lopez-Loreta Prize. QN
  and GM acknowledge support from the European Research Council (ERC) under the European
  Union’s Horizon 2020 research and innovation programme (grant agreement no 757983).
article_processing_charge: No
arxiv: 1
author:
- first_name: Quynh
  full_name: Nguyen, Quynh
  last_name: Nguyen
- first_name: Marco
  full_name: Mondelli, Marco
  id: 27EB676C-8706-11E9-9510-7717E6697425
  last_name: Mondelli
  orcid: 0000-0002-3242-7020
- first_name: Guido
  full_name: Montufar, Guido
  last_name: Montufar
citation:
  ama: 'Nguyen Q, Mondelli M, Montufar G. Tight bounds on the smallest Eigenvalue
    of the neural tangent kernel for deep ReLU networks. In: <i>Proceedings of the
    38th International Conference on Machine Learning</i>. Vol 139. ML Research Press;
    2021:8119-8129.'
  apa: 'Nguyen, Q., Mondelli, M., &#38; Montufar, G. (2021). Tight bounds on the smallest
    Eigenvalue of the neural tangent kernel for deep ReLU networks. In <i>Proceedings
    of the 38th International Conference on Machine Learning</i> (Vol. 139, pp. 8119–8129).
    Virtual: ML Research Press.'
  chicago: Nguyen, Quynh, Marco Mondelli, and Guido Montufar. “Tight Bounds on the
    Smallest Eigenvalue of the Neural Tangent Kernel for Deep ReLU Networks.” In <i>Proceedings
    of the 38th International Conference on Machine Learning</i>, 139:8119–29. ML
    Research Press, 2021.
  ieee: Q. Nguyen, M. Mondelli, and G. Montufar, “Tight bounds on the smallest Eigenvalue
    of the neural tangent kernel for deep ReLU networks,” in <i>Proceedings of the
    38th International Conference on Machine Learning</i>, Virtual, 2021, vol. 139,
    pp. 8119–8129.
  ista: Nguyen Q, Mondelli M, Montufar G. 2021. Tight bounds on the smallest Eigenvalue
    of the neural tangent kernel for deep ReLU networks. Proceedings of the 38th International
    Conference on Machine Learning. International Conference on Machine Learning vol.
    139, 8119–8129.
  mla: Nguyen, Quynh, et al. “Tight Bounds on the Smallest Eigenvalue of the Neural
    Tangent Kernel for Deep ReLU Networks.” <i>Proceedings of the 38th International
    Conference on Machine Learning</i>, vol. 139, ML Research Press, 2021, pp. 8119–29.
  short: Q. Nguyen, M. Mondelli, G. Montufar, in:, Proceedings of the 38th International
    Conference on Machine Learning, ML Research Press, 2021, pp. 8119–8129.
conference:
  end_date: 2021-07-24
  location: Virtual
  name: International Conference on Machine Learning
  start_date: 2021-07-18
date_created: 2023-06-18T22:00:48Z
date_published: 2021-07-01T00:00:00Z
date_updated: 2024-09-10T13:03:17Z
day: '01'
ddc:
- '000'
department:
- _id: MaMo
external_id:
  arxiv:
  - '2012.11654'
file:
- access_level: open_access
  checksum: 19489cf5e16a0596b1f92e317d97c9b0
  content_type: application/pdf
  creator: dernst
  date_created: 2023-06-19T10:49:12Z
  date_updated: 2023-06-19T10:49:12Z
  file_id: '13155'
  file_name: 2021_PMLR_Nguyen.pdf
  file_size: 591332
  relation: main_file
  success: 1
file_date_updated: 2023-06-19T10:49:12Z
has_accepted_license: '1'
intvolume: '       139'
language:
- iso: eng
month: '07'
oa: 1
oa_version: Published Version
page: 8119-8129
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
  name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: Proceedings of the 38th International Conference on Machine Learning
publication_identifier:
  eissn:
  - 2640-3498
  isbn:
  - '9781713845065'
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
scopus_import: '1'
status: public
title: Tight bounds on the smallest Eigenvalue of the neural tangent kernel for deep
  ReLU networks
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 139
year: '2021'
...
---
_id: '13147'
abstract:
- lang: eng
  text: "We investigate fast and communication-efficient algorithms for the classic
    problem of minimizing a sum of strongly convex and smooth functions that are distributed
    among n\r\n different nodes, which can communicate using a limited number of bits.
    Most previous communication-efficient approaches for this problem are limited
    to first-order optimization, and therefore have \\emph{linear} dependence on the
    condition number in their communication complexity. We show that this dependence
    is not inherent: communication-efficient methods can in fact have sublinear dependence
    on the condition number. For this, we design and analyze the first communication-efficient
    distributed variants of preconditioned gradient descent for Generalized Linear
    Models, and for Newton’s method. Our results rely on a new technique for quantizing
    both the preconditioner and the descent direction at each step of the algorithms,
    while controlling their convergence rate. We also validate our findings experimentally,
    showing faster convergence and reduced communication relative to previous methods."
acknowledgement: The authors would like to thank Janne Korhonen, Aurelien Lucchi,
  Celestine MendlerDunner and Antonio Orvieto for helpful discussions. FA ¨and DA
  were supported during this work by the European Research Council (ERC) under the
  European Union’s Horizon 2020 research and innovation programme (grant agreement
  No 805223 ScaleML). PD was supported by the European Union’s Horizon 2020 programme
  under the Marie Skłodowska-Curie grant agreement No. 754411.
article_processing_charge: No
arxiv: 1
author:
- first_name: Foivos
  full_name: Alimisis, Foivos
  last_name: Alimisis
- first_name: Peter
  full_name: Davies, Peter
  id: 11396234-BB50-11E9-B24C-90FCE5697425
  last_name: Davies
  orcid: 0000-0002-5646-9524
- first_name: Dan-Adrian
  full_name: Alistarh, Dan-Adrian
  id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
  last_name: Alistarh
  orcid: 0000-0003-3650-940X
citation:
  ama: 'Alimisis F, Davies P, Alistarh D-A. Communication-efficient distributed optimization
    with quantized preconditioners. In: <i>Proceedings of the 38th International Conference
    on Machine Learning</i>. Vol 139. ML Research Press; 2021:196-206.'
  apa: 'Alimisis, F., Davies, P., &#38; Alistarh, D.-A. (2021). Communication-efficient
    distributed optimization with quantized preconditioners. In <i>Proceedings of
    the 38th International Conference on Machine Learning</i> (Vol. 139, pp. 196–206).
    Virtual: ML Research Press.'
  chicago: Alimisis, Foivos, Peter Davies, and Dan-Adrian Alistarh. “Communication-Efficient
    Distributed Optimization with Quantized Preconditioners.” In <i>Proceedings of
    the 38th International Conference on Machine Learning</i>, 139:196–206. ML Research
    Press, 2021.
  ieee: F. Alimisis, P. Davies, and D.-A. Alistarh, “Communication-efficient distributed
    optimization with quantized preconditioners,” in <i>Proceedings of the 38th International
    Conference on Machine Learning</i>, Virtual, 2021, vol. 139, pp. 196–206.
  ista: Alimisis F, Davies P, Alistarh D-A. 2021. Communication-efficient distributed
    optimization with quantized preconditioners. Proceedings of the 38th International
    Conference on Machine Learning. International Conference on Machine Learning vol.
    139, 196–206.
  mla: Alimisis, Foivos, et al. “Communication-Efficient Distributed Optimization
    with Quantized Preconditioners.” <i>Proceedings of the 38th International Conference
    on Machine Learning</i>, vol. 139, ML Research Press, 2021, pp. 196–206.
  short: F. Alimisis, P. Davies, D.-A. Alistarh, in:, Proceedings of the 38th International
    Conference on Machine Learning, ML Research Press, 2021, pp. 196–206.
conference:
  end_date: 2021-07-24
  location: Virtual
  name: International Conference on Machine Learning
  start_date: 2021-07-18
date_created: 2023-06-18T22:00:48Z
date_published: 2021-07-01T00:00:00Z
date_updated: 2023-06-19T10:44:38Z
day: '01'
ddc:
- '000'
department:
- _id: DaAl
ec_funded: 1
external_id:
  arxiv:
  - '2102.07214'
file:
- access_level: open_access
  checksum: 7ec0d59bac268b49c76bf2e036dedd7a
  content_type: application/pdf
  creator: dernst
  date_created: 2023-06-19T10:41:05Z
  date_updated: 2023-06-19T10:41:05Z
  file_id: '13154'
  file_name: 2021_PMLR_Alimisis.pdf
  file_size: 429087
  relation: main_file
  success: 1
file_date_updated: 2023-06-19T10:41:05Z
has_accepted_license: '1'
intvolume: '       139'
language:
- iso: eng
month: '07'
oa: 1
oa_version: Published Version
page: 196-206
project:
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
  call_identifier: H2020
  grant_number: '805223'
  name: Elastic Coordination for Scalable Machine Learning
- _id: 260C2330-B435-11E9-9278-68D0E5697425
  call_identifier: H2020
  grant_number: '754411'
  name: ISTplus - Postdoctoral Fellowships
publication: Proceedings of the 38th International Conference on Machine Learning
publication_identifier:
  eissn:
  - 2640-3498
  isbn:
  - '9781713845065'
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
scopus_import: '1'
status: public
title: Communication-efficient distributed optimization with quantized preconditioners
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 139
year: '2021'
...
---
_id: '11651'
abstract:
- lang: eng
  text: Diffusions and related random walk procedures are of central importance in
    many areas of machine learning, data analysis, and applied mathematics. Because
    they spread mass agnostically at each step in an iterative manner, they can sometimes
    spread mass “too aggressively,” thereby failing to find the “right” clusters.
    We introduce a novel Capacity Releasing Diffusion (CRD) Process, which is both
    faster and stays more local than the classical spectral diffusion process. As
    an application, we use our CRD Process to develop an improved local algorithm
    for graph clustering. Our local graph clustering method can find local clusters
    in a model of clustering where one begins the CRD Process in a cluster whose vertices
    are connected better internally than externally by an O(log2n) factor, where n
    is the number of nodes in the cluster. Thus, our CRD Process is the first local
    graph clustering algorithm that is not subject to the well-known quadratic Cheeger
    barrier. Our result requires a certain smoothness condition, which we expect to
    be an artifact of our analysis. Our empirical evaluation demonstrates improved
    results, in particular for realistic social graphs where there are moderately
    good—but not very good—clusters.
alternative_title:
- PMLR
article_processing_charge: No
arxiv: 1
author:
- first_name: Di
  full_name: Wang, Di
  last_name: Wang
- first_name: Kimon
  full_name: Fountoulakis, Kimon
  last_name: Fountoulakis
- first_name: Monika H
  full_name: Henzinger, Monika H
  id: 540c9bbd-f2de-11ec-812d-d04a5be85630
  last_name: Henzinger
  orcid: 0000-0002-5008-6530
- first_name: Michael W.
  full_name: Mahoney, Michael W.
  last_name: Mahoney
- first_name: ' Satish'
  full_name: Rao ,  Satish
  last_name: 'Rao '
citation:
  ama: 'Wang D, Fountoulakis K, Henzinger MH, Mahoney MW, Rao   Satish. Capacity releasing
    diffusion for speed and locality. In: <i>Proceedings of the 34th International
    Conference on Machine Learning</i>. Vol 70. ML Research Press; 2017:3598-3607.'
  apa: 'Wang, D., Fountoulakis, K., Henzinger, M. H., Mahoney, M. W., &#38; Rao ,  Satish.
    (2017). Capacity releasing diffusion for speed and locality. In <i>Proceedings
    of the 34th International Conference on Machine Learning</i> (Vol. 70, pp. 3598–3607).
    Sydney, Australia: ML Research Press.'
  chicago: Wang, Di, Kimon Fountoulakis, Monika H Henzinger, Michael W. Mahoney, and  Satish
    Rao . “Capacity Releasing Diffusion for Speed and Locality.” In <i>Proceedings
    of the 34th International Conference on Machine Learning</i>, 70:3598–3607. ML
    Research Press, 2017.
  ieee: D. Wang, K. Fountoulakis, M. H. Henzinger, M. W. Mahoney, and  Satish Rao
    , “Capacity releasing diffusion for speed and locality,” in <i>Proceedings of
    the 34th International Conference on Machine Learning</i>, Sydney, Australia,
    2017, vol. 70, pp. 3598–3607.
  ista: Wang D, Fountoulakis K, Henzinger MH, Mahoney MW, Rao   Satish. 2017. Capacity
    releasing diffusion for speed and locality. Proceedings of the 34th International
    Conference on Machine Learning. International Conference on Machine Learning,
    PMLR, vol. 70, 3598–3607.
  mla: Wang, Di, et al. “Capacity Releasing Diffusion for Speed and Locality.” <i>Proceedings
    of the 34th International Conference on Machine Learning</i>, vol. 70, ML Research
    Press, 2017, pp. 3598–607.
  short: D. Wang, K. Fountoulakis, M.H. Henzinger, M.W. Mahoney,  Satish Rao , in:,
    Proceedings of the 34th International Conference on Machine Learning, ML Research
    Press, 2017, pp. 3598–3607.
conference:
  end_date: 2017-08-11
  location: Sydney, Australia
  name: International Conference on Machine Learning
  start_date: 2017-08-06
date_created: 2022-07-25T13:59:21Z
date_published: 2017-09-01T00:00:00Z
date_updated: 2023-02-09T09:15:31Z
day: '01'
extern: '1'
external_id:
  arxiv:
  - '1706.05826'
intvolume: '        70'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: http://proceedings.mlr.press/v70/wang17b/wang17b.pdf
month: '09'
oa: 1
oa_version: Published Version
page: 3598-3607
publication: Proceedings of the 34th International Conference on Machine Learning
publication_identifier:
  eissn:
  - 2640-3498
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
status: public
title: Capacity releasing diffusion for speed and locality
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 70
year: '2017'
...
