[{"volume":234,"abstract":[{"text":"Pruning large language models (LLMs) from the BERT family has emerged as a standard compression benchmark, and several pruning methods have been proposed for this task. The recent “Sparsity May Cry” (SMC) benchmark put into question the validity of all existing methods, exhibiting a more complex setup where many known pruning methods appear to fail. We revisit the question of accurate BERT-pruning during fine-tuning on downstream datasets, and propose a set of general guidelines for successful pruning, even on the challenging SMC benchmark. First, we perform a cost-vs-benefits analysis of pruning model components, such as the embeddings and the classification head; second, we provide a simple-yet-general way of scaling training, sparsification and learning rate schedules relative to the desired target sparsity; finally, we investigate the importance of proper parametrization for Knowledge Distillation in the context of LLMs. Our simple insights lead to state-of-the-art results, both on classic BERT-pruning benchmarks, as well as on the SMC benchmark, showing that even classic gradual magnitude pruning (GMP) can yield competitive results, with the right approach.","lang":"eng"}],"arxiv":1,"day":"08","external_id":{"arxiv":["2312.13547"]},"date_updated":"2024-02-26T10:30:52Z","year":"2024","citation":{"ieee":"E. Kurtic, T. Hoefler, and D.-A. Alistarh, “How to prune your language model: Recovering accuracy on the ‘Sparsity May Cry’ benchmark,” in <i>Proceedings of Machine Learning Research</i>, Hongkong, China, 2024, vol. 234, pp. 542–553.","chicago":"Kurtic, Eldar, Torsten Hoefler, and Dan-Adrian Alistarh. “How to Prune Your Language Model: Recovering Accuracy on the ‘Sparsity May Cry’ Benchmark.” In <i>Proceedings of Machine Learning Research</i>, 234:542–53. ML Research Press, 2024.","ama":"Kurtic E, Hoefler T, Alistarh D-A. How to prune your language model: Recovering accuracy on the “Sparsity May Cry” benchmark. In: <i>Proceedings of Machine Learning Research</i>. Vol 234. ML Research Press; 2024:542-553.","apa":"Kurtic, E., Hoefler, T., &#38; Alistarh, D.-A. (2024). How to prune your language model: Recovering accuracy on the “Sparsity May Cry” benchmark. In <i>Proceedings of Machine Learning Research</i> (Vol. 234, pp. 542–553). Hongkong, China: ML Research Press.","ista":"Kurtic E, Hoefler T, Alistarh D-A. 2024. How to prune your language model: Recovering accuracy on the ‘Sparsity May Cry’ benchmark. Proceedings of Machine Learning Research. CPAL: Conference on Parsimony and Learning, PMLR, vol. 234, 542–553.","short":"E. Kurtic, T. Hoefler, D.-A. Alistarh, in:, Proceedings of Machine Learning Research, ML Research Press, 2024, pp. 542–553.","mla":"Kurtic, Eldar, et al. “How to Prune Your Language Model: Recovering Accuracy on the ‘Sparsity May Cry’ Benchmark.” <i>Proceedings of Machine Learning Research</i>, vol. 234, ML Research Press, 2024, pp. 542–53."},"publisher":"ML Research Press","page":"542-553","quality_controlled":"1","title":"How to prune your language model: Recovering accuracy on the \"Sparsity May Cry\" benchmark","alternative_title":["PMLR"],"intvolume":"       234","publication_status":"published","department":[{"_id":"DaAl"}],"date_created":"2024-02-18T23:01:03Z","article_processing_charge":"No","author":[{"last_name":"Kurtic","first_name":"Eldar","full_name":"Kurtic, Eldar","id":"47beb3a5-07b5-11eb-9b87-b108ec578218"},{"full_name":"Hoefler, Torsten","first_name":"Torsten","last_name":"Hoefler"},{"first_name":"Dan-Adrian","last_name":"Alistarh","orcid":"0000-0003-3650-940X","full_name":"Alistarh, Dan-Adrian","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87"}],"_id":"15011","scopus_import":"1","status":"public","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","main_file_link":[{"open_access":"1","url":"https://proceedings.mlr.press/v234/kurtic24a"}],"oa":1,"publication_identifier":{"eissn":["2640-3498"]},"date_published":"2024-01-08T00:00:00Z","type":"conference","conference":{"location":"Hongkong, China","end_date":"2024-01-06","start_date":"2024-01-03","name":"CPAL: Conference on Parsimony and Learning"},"language":[{"iso":"eng"}],"month":"01","oa_version":"Preprint","publication":"Proceedings of Machine Learning Research"},{"status":"public","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","main_file_link":[{"open_access":"1","url":"https://doi.org/10.48550/arXiv.2301.00774"}],"date_published":"2023-07-30T00:00:00Z","type":"conference","oa":1,"publication_identifier":{"eissn":["2640-3498"]},"language":[{"iso":"eng"}],"conference":{"location":"Honolulu, Hawaii, HI, United States","end_date":"2023-07-29","name":"ICML: International Conference on Machine Learning","start_date":"2023-07-23"},"publication":"Proceedings of the 40th International Conference on Machine Learning","month":"07","oa_version":"Preprint","acknowledged_ssus":[{"_id":"ScienComp"}],"project":[{"call_identifier":"H2020","_id":"268A44D6-B435-11E9-9278-68D0E5697425","name":"Elastic Coordination for Scalable Machine Learning","grant_number":"805223"}],"acknowledgement":"The authors gratefully acknowledge funding from the European Research Council (ERC) under the European Union’s Horizon 2020 programme (grant agreement No. 805223 ScaleML), as well as experimental support from Eldar Kurtic, and from the IST Austria IT department, in particular Stefano Elefante, Andrei Hornoiu, and Alois Schloegl.","volume":202,"external_id":{"arxiv":["2301.00774"]},"date_updated":"2023-10-31T09:59:42Z","year":"2023","citation":{"ista":"Frantar E, Alistarh D-A. 2023. SparseGPT: Massive language models can be accurately pruned in one-shot. Proceedings of the 40th International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 202, 10323–10337.","short":"E. Frantar, D.-A. Alistarh, in:, Proceedings of the 40th International Conference on Machine Learning, ML Research Press, 2023, pp. 10323–10337.","mla":"Frantar, Elias, and Dan-Adrian Alistarh. “SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot.” <i>Proceedings of the 40th International Conference on Machine Learning</i>, vol. 202, ML Research Press, 2023, pp. 10323–37.","chicago":"Frantar, Elias, and Dan-Adrian Alistarh. “SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot.” In <i>Proceedings of the 40th International Conference on Machine Learning</i>, 202:10323–37. ML Research Press, 2023.","ieee":"E. Frantar and D.-A. Alistarh, “SparseGPT: Massive language models can be accurately pruned in one-shot,” in <i>Proceedings of the 40th International Conference on Machine Learning</i>, Honolulu, Hawaii, HI, United States, 2023, vol. 202, pp. 10323–10337.","ama":"Frantar E, Alistarh D-A. SparseGPT: Massive language models can be accurately pruned in one-shot. In: <i>Proceedings of the 40th International Conference on Machine Learning</i>. Vol 202. ML Research Press; 2023:10323-10337.","apa":"Frantar, E., &#38; Alistarh, D.-A. (2023). SparseGPT: Massive language models can be accurately pruned in one-shot. In <i>Proceedings of the 40th International Conference on Machine Learning</i> (Vol. 202, pp. 10323–10337). Honolulu, Hawaii, HI, United States: ML Research Press."},"abstract":[{"lang":"eng","text":"We show for the first time that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. This is achieved via a new pruning method called SparseGPT, specifically designed to work efficiently and accurately on massive GPT-family models. We can execute SparseGPT on the largest available open-source models, OPT-175B and BLOOM-176B, in under 4.5 hours, and can reach 60% unstructured sparsity with negligible increase in perplexity: remarkably, more than 100 billion weights from these models can be ignored at inference time. SparseGPT generalizes to semi-structured (2:4 and 4:8) patterns, and is compatible with weight quantization approaches. The code is available at: https://github.com/IST-DASLab/sparsegpt."}],"arxiv":1,"day":"30","page":"10323-10337","ec_funded":1,"quality_controlled":"1","publisher":"ML Research Press","author":[{"id":"09a8f98d-ec99-11ea-ae11-c063a7b7fe5f","last_name":"Frantar","first_name":"Elias","full_name":"Frantar, Elias"},{"last_name":"Alistarh","first_name":"Dan-Adrian","full_name":"Alistarh, Dan-Adrian","orcid":"0000-0003-3650-940X","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87"}],"_id":"14458","scopus_import":"1","title":"SparseGPT: Massive language models can be accurately pruned in one-shot","alternative_title":["PMLR"],"intvolume":"       202","publication_status":"published","department":[{"_id":"DaAl"}],"date_created":"2023-10-29T23:01:16Z","article_processing_charge":"No"},{"publication_identifier":{"eissn":["2640-3498"]},"oa":1,"date_published":"2023-07-30T00:00:00Z","type":"conference","main_file_link":[{"url":"https://doi.org/10.48550/arXiv.2212.13468","open_access":"1"}],"user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","status":"public","oa_version":"Preprint","project":[{"name":"Prix Lopez-Loretta 2019 - Marco Mondelli","_id":"059876FA-7A3F-11EA-A408-12923DDC885E"}],"month":"07","publication":"Proceedings of the 40th International Conference on Machine Learning","conference":{"name":"ICML: International Conference on Machine Learning","start_date":"2023-07-23","end_date":"2023-07-29","location":"Honolulu, Hawaii, HI, United States"},"language":[{"iso":"eng"}],"arxiv":1,"day":"30","abstract":[{"lang":"eng","text":"Autoencoders are a popular model in many branches of machine learning and lossy data compression. However, their fundamental limits, the performance of gradient methods and the features learnt during optimization remain poorly understood, even in the two-layer setting. In fact, earlier work has considered either linear autoencoders or specific training regimes (leading to vanishing or diverging compression rates). Our paper addresses this gap by focusing on non-linear two-layer autoencoders trained in the challenging proportional regime in which the input dimension scales linearly with the size of the representation. Our results characterize the minimizers of the population risk, and show that such minimizers are achieved by gradient methods; their structure is also unveiled, thus leading to a concise description of the features obtained via training. For the special case of a sign activation function, our analysis establishes the fundamental limits for the lossy compression of Gaussian sources via (shallow) autoencoders. Finally, while the results are proved for Gaussian data, numerical simulations on standard datasets display the universality of the theoretical predictions."}],"date_updated":"2024-09-10T13:03:19Z","citation":{"ista":"Shevchenko A, Kögler K, Hassani H, Mondelli M. 2023. Fundamental limits of two-layer autoencoders, and achieving them with gradient methods. Proceedings of the 40th International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 202, 31151–31209.","short":"A. Shevchenko, K. Kögler, H. Hassani, M. Mondelli, in:, Proceedings of the 40th International Conference on Machine Learning, ML Research Press, 2023, pp. 31151–31209.","mla":"Shevchenko, Aleksandr, et al. “Fundamental Limits of Two-Layer Autoencoders, and Achieving Them with Gradient Methods.” <i>Proceedings of the 40th International Conference on Machine Learning</i>, vol. 202, ML Research Press, 2023, pp. 31151–209.","ieee":"A. Shevchenko, K. Kögler, H. Hassani, and M. Mondelli, “Fundamental limits of two-layer autoencoders, and achieving them with gradient methods,” in <i>Proceedings of the 40th International Conference on Machine Learning</i>, Honolulu, Hawaii, HI, United States, 2023, vol. 202, pp. 31151–31209.","chicago":"Shevchenko, Aleksandr, Kevin Kögler, Hamed Hassani, and Marco Mondelli. “Fundamental Limits of Two-Layer Autoencoders, and Achieving Them with Gradient Methods.” In <i>Proceedings of the 40th International Conference on Machine Learning</i>, 202:31151–209. ML Research Press, 2023.","apa":"Shevchenko, A., Kögler, K., Hassani, H., &#38; Mondelli, M. (2023). Fundamental limits of two-layer autoencoders, and achieving them with gradient methods. In <i>Proceedings of the 40th International Conference on Machine Learning</i> (Vol. 202, pp. 31151–31209). Honolulu, Hawaii, HI, United States: ML Research Press.","ama":"Shevchenko A, Kögler K, Hassani H, Mondelli M. Fundamental limits of two-layer autoencoders, and achieving them with gradient methods. In: <i>Proceedings of the 40th International Conference on Machine Learning</i>. Vol 202. ML Research Press; 2023:31151-31209."},"year":"2023","external_id":{"arxiv":["2212.13468"]},"acknowledgement":"Aleksandr Shevchenko, Kevin Kogler and Marco Mondelli are supported by the 2019 Lopez-Loreta Prize. Hamed Hassani acknowledges the support by the NSF CIF award (1910056) and the NSF Institute for CORE Emerging Methods in Data Science (EnCORE).","volume":202,"publication_status":"published","article_processing_charge":"No","date_created":"2023-10-29T23:01:17Z","department":[{"_id":"MaMo"},{"_id":"DaAl"}],"title":"Fundamental limits of two-layer autoencoders, and achieving them with gradient methods","alternative_title":["PMLR"],"intvolume":"       202","_id":"14459","scopus_import":"1","author":[{"full_name":"Shevchenko, Aleksandr","first_name":"Aleksandr","last_name":"Shevchenko","id":"F2B06EC2-C99E-11E9-89F0-752EE6697425"},{"id":"94ec913c-dc85-11ea-9058-e5051ab2428b","full_name":"Kögler, Kevin","last_name":"Kögler","first_name":"Kevin"},{"full_name":"Hassani, Hamed","first_name":"Hamed","last_name":"Hassani"},{"first_name":"Marco","last_name":"Mondelli","orcid":"0000-0002-3242-7020","full_name":"Mondelli, Marco","id":"27EB676C-8706-11E9-9510-7717E6697425"}],"publisher":"ML Research Press","page":"31151-31209","quality_controlled":"1"},{"publication":"Proceedings of the 40th International Conference on Machine Learning","oa_version":"Preprint","project":[{"name":"Elastic Coordination for Scalable Machine Learning","grant_number":"805223","call_identifier":"H2020","_id":"268A44D6-B435-11E9-9278-68D0E5697425"}],"month":"07","language":[{"iso":"eng"}],"conference":{"start_date":"2023-07-23","name":"ICML: International Conference on Machine Learning","location":"Honolulu, Hawaii, HI, United States","end_date":"2023-07-29"},"date_published":"2023-07-30T00:00:00Z","type":"conference","publication_identifier":{"eissn":["2640-3498"]},"oa":1,"main_file_link":[{"open_access":"1","url":"https://doi.org/10.48550/arXiv.2302.04852"}],"user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","status":"public","_id":"14460","scopus_import":"1","author":[{"last_name":"Nikdan","first_name":"Mahdi","full_name":"Nikdan, Mahdi","id":"66374281-f394-11eb-9cf6-869147deecc0"},{"full_name":"Pegolotti, Tommaso","first_name":"Tommaso","last_name":"Pegolotti"},{"id":"f9a17499-f6e0-11ea-865d-fdf9a3f77117","full_name":"Iofinova, Eugenia B","orcid":"0000-0002-7778-3221","last_name":"Iofinova","first_name":"Eugenia B"},{"full_name":"Kurtic, Eldar","first_name":"Eldar","last_name":"Kurtic","id":"47beb3a5-07b5-11eb-9b87-b108ec578218"},{"id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X","full_name":"Alistarh, Dan-Adrian","first_name":"Dan-Adrian","last_name":"Alistarh"}],"publication_status":"published","date_created":"2023-10-29T23:01:17Z","department":[{"_id":"DaAl"}],"article_processing_charge":"No","alternative_title":["PMLR"],"title":"SparseProp: Efficient sparse backpropagation for faster training of neural networks at the edge","intvolume":"       202","page":"26215-26227","ec_funded":1,"quality_controlled":"1","publisher":"ML Research Press","date_updated":"2023-10-31T09:33:51Z","year":"2023","citation":{"mla":"Nikdan, Mahdi, et al. “SparseProp: Efficient Sparse Backpropagation for Faster Training of Neural Networks at the Edge.” <i>Proceedings of the 40th International Conference on Machine Learning</i>, vol. 202, ML Research Press, 2023, pp. 26215–27.","short":"M. Nikdan, T. Pegolotti, E.B. Iofinova, E. Kurtic, D.-A. Alistarh, in:, Proceedings of the 40th International Conference on Machine Learning, ML Research Press, 2023, pp. 26215–26227.","ista":"Nikdan M, Pegolotti T, Iofinova EB, Kurtic E, Alistarh D-A. 2023. SparseProp: Efficient sparse backpropagation for faster training of neural networks at the edge. Proceedings of the 40th International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 202, 26215–26227.","apa":"Nikdan, M., Pegolotti, T., Iofinova, E. B., Kurtic, E., &#38; Alistarh, D.-A. (2023). SparseProp: Efficient sparse backpropagation for faster training of neural networks at the edge. In <i>Proceedings of the 40th International Conference on Machine Learning</i> (Vol. 202, pp. 26215–26227). Honolulu, Hawaii, HI, United States: ML Research Press.","ama":"Nikdan M, Pegolotti T, Iofinova EB, Kurtic E, Alistarh D-A. SparseProp: Efficient sparse backpropagation for faster training of neural networks at the edge. In: <i>Proceedings of the 40th International Conference on Machine Learning</i>. Vol 202. ML Research Press; 2023:26215-26227.","chicago":"Nikdan, Mahdi, Tommaso Pegolotti, Eugenia B Iofinova, Eldar Kurtic, and Dan-Adrian Alistarh. “SparseProp: Efficient Sparse Backpropagation for Faster Training of Neural Networks at the Edge.” In <i>Proceedings of the 40th International Conference on Machine Learning</i>, 202:26215–27. ML Research Press, 2023.","ieee":"M. Nikdan, T. Pegolotti, E. B. Iofinova, E. Kurtic, and D.-A. Alistarh, “SparseProp: Efficient sparse backpropagation for faster training of neural networks at the edge,” in <i>Proceedings of the 40th International Conference on Machine Learning</i>, Honolulu, Hawaii, HI, United States, 2023, vol. 202, pp. 26215–26227."},"external_id":{"arxiv":["2302.04852"]},"arxiv":1,"day":"30","abstract":[{"text":"We provide an efficient implementation of the backpropagation algorithm, specialized to the case where the weights of the neural network being trained are sparse. Our algorithm is general, as it applies to arbitrary (unstructured) sparsity and common layer types (e.g., convolutional or linear). We provide a fast vectorized implementation on commodity CPUs, and show that it can yield speedups in end-to-end runtime experiments, both in transfer learning using already-sparsified networks, and in training sparse networks from scratch. Thus, our results provide the first support for sparse training on commodity hardware.","lang":"eng"}],"acknowledgement":"We would like to thank Elias Frantar for his valuable assistance and support at the outset of this project, and the anonymous ICML and SNN reviewers for very constructive feedback. EI was supported in part by the FWF DK VGSCO, grant agreement number W1260-N35. DA acknowledges generous ERC support, via Starting Grant 805223 ScaleML. ","volume":202},{"status":"public","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","main_file_link":[{"url":"https://doi.org/10.48550/arXiv.2302.02390","open_access":"1"}],"date_published":"2023-07-30T00:00:00Z","type":"conference","oa":1,"publication_identifier":{"eissn":["2640-3498"]},"language":[{"iso":"eng"}],"conference":{"name":"ICML: International Conference on Machine Learning","start_date":"2023-07-23","end_date":"2023-07-29","location":"Honolulu, Hawaii, HI, United States"},"publication":"Proceedings of the 40th International Conference on Machine Learning","month":"07","oa_version":"Preprint","acknowledged_ssus":[{"_id":"ScienComp"}],"project":[{"name":"Elastic Coordination for Scalable Machine Learning","grant_number":"805223","call_identifier":"H2020","_id":"268A44D6-B435-11E9-9278-68D0E5697425"}],"acknowledgement":"The authors gratefully acknowledge funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 805223 ScaleML), as well as experimental support from the IST Austria IT department, in particular Stefano Elefante, Andrei Hornoiu, and Alois Schloegl. AV acknowledges the support of the French Agence Nationale de la Recherche (ANR), under grant ANR-21-CE48-0016 (project COMCOPT), the support of Fondation Hadamard with a PRMO grant, and the support of CNRS with a CoopIntEER IEA grant (project ALFRED).","volume":202,"external_id":{"arxiv":["2302.02390"]},"date_updated":"2023-10-31T09:40:45Z","citation":{"ieee":"I. Markov, A. Vladu, Q. Guo, and D.-A. Alistarh, “Quantized distributed training of large models with convergence guarantees,” in <i>Proceedings of the 40th International Conference on Machine Learning</i>, Honolulu, Hawaii, HI, United States, 2023, vol. 202, pp. 24020–24044.","chicago":"Markov, Ilia, Adrian Vladu, Qi Guo, and Dan-Adrian Alistarh. “Quantized Distributed Training of Large Models with Convergence Guarantees.” In <i>Proceedings of the 40th International Conference on Machine Learning</i>, 202:24020–44. ML Research Press, 2023.","ama":"Markov I, Vladu A, Guo Q, Alistarh D-A. Quantized distributed training of large models with convergence guarantees. In: <i>Proceedings of the 40th International Conference on Machine Learning</i>. Vol 202. ML Research Press; 2023:24020-24044.","apa":"Markov, I., Vladu, A., Guo, Q., &#38; Alistarh, D.-A. (2023). Quantized distributed training of large models with convergence guarantees. In <i>Proceedings of the 40th International Conference on Machine Learning</i> (Vol. 202, pp. 24020–24044). Honolulu, Hawaii, HI, United States: ML Research Press.","ista":"Markov I, Vladu A, Guo Q, Alistarh D-A. 2023. Quantized distributed training of large models with convergence guarantees. Proceedings of the 40th International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 202, 24020–24044.","mla":"Markov, Ilia, et al. “Quantized Distributed Training of Large Models with Convergence Guarantees.” <i>Proceedings of the 40th International Conference on Machine Learning</i>, vol. 202, ML Research Press, 2023, pp. 24020–44.","short":"I. Markov, A. Vladu, Q. Guo, D.-A. Alistarh, in:, Proceedings of the 40th International Conference on Machine Learning, ML Research Press, 2023, pp. 24020–24044."},"year":"2023","abstract":[{"text":"Communication-reduction techniques are a popular way to improve scalability in data-parallel training of deep neural networks (DNNs). The recent emergence of large language models such as GPT has created the need for new approaches to exploit data-parallelism. Among these, fully-sharded data parallel (FSDP) training is highly popular, yet it still encounters scalability bottlenecks. One reason is that applying compression techniques to FSDP is challenging: as the vast majority of the communication involves the model’s weights, direct compression alters convergence and leads to accuracy loss. We present QSDP, a variant of FSDP which supports both gradient and weight quantization with theoretical guarantees, is simple to implement and has essentially no overheads. To derive QSDP we prove that a natural modification of SGD achieves convergence even when we only maintain quantized weights, and thus the domain over which we train consists of quantized points and is, therefore, highly non-convex. We validate this approach by training GPT-family models with up to 1.3 billion parameters on a multi-node cluster. Experiments show that QSDP preserves model accuracy, while completely removing the communication bottlenecks of FSDP, providing end-to-end speedups of up to 2.2x.","lang":"eng"}],"arxiv":1,"day":"30","page":"24020-24044","ec_funded":1,"quality_controlled":"1","publisher":"ML Research Press","author":[{"id":"D0CF4148-C985-11E9-8066-0BDEE5697425","full_name":"Markov, Ilia","first_name":"Ilia","last_name":"Markov"},{"last_name":"Vladu","first_name":"Adrian","full_name":"Vladu, Adrian"},{"first_name":"Qi","last_name":"Guo","full_name":"Guo, Qi"},{"id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","last_name":"Alistarh","first_name":"Dan-Adrian","full_name":"Alistarh, Dan-Adrian","orcid":"0000-0003-3650-940X"}],"_id":"14461","scopus_import":"1","alternative_title":["PMLR"],"title":"Quantized distributed training of large models with convergence guarantees","intvolume":"       202","publication_status":"published","date_created":"2023-10-29T23:01:17Z","department":[{"_id":"DaAl"}],"article_processing_charge":"No"},{"language":[{"iso":"eng"}],"conference":{"name":"ICML: International Conference on Machine Learning","start_date":"2023-07-23","end_date":"2023-07-29","location":"Honolulu, Hawaii, HI, United States"},"publication":"Proceedings of the 40th International Conference on Machine Learning","month":"07","project":[{"call_identifier":"H2020","_id":"bd9ca328-d553-11ed-ba76-dc4f890cfe62","name":"The design and evaluation of modern fully dynamic data structures","grant_number":"101019564"},{"_id":"34def286-11ca-11ed-8bc3-da5948e1613c","grant_number":"Z00422","name":"Wittgenstein Award - Monika Henzinger"},{"_id":"bd9e3a2e-d553-11ed-ba76-8aa684ce17fe","name":"Fast Algorithms for a Reactive Network Layer","grant_number":"P33775 "}],"oa_version":"Published Version","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","status":"public","main_file_link":[{"open_access":"1","url":"https://proceedings.mlr.press/v202/fichtenberger23a/fichtenberger23a.pdf"}],"type":"conference","date_published":"2023-07-30T00:00:00Z","oa":1,"publication_identifier":{"eissn":["2640-3498"]},"ec_funded":1,"quality_controlled":"1","page":"10072-10092","publisher":"ML Research Press","author":[{"first_name":"Hendrik","last_name":"Fichtenberger","full_name":"Fichtenberger, Hendrik"},{"last_name":"Henzinger","first_name":"Monika H","full_name":"Henzinger, Monika H","orcid":"0000-0002-5008-6530","id":"540c9bbd-f2de-11ec-812d-d04a5be85630"},{"first_name":"Jalaj","last_name":"Upadhyay","full_name":"Upadhyay, Jalaj"}],"scopus_import":"1","_id":"14462","intvolume":"       202","alternative_title":["PMLR"],"title":"Constant matters: Fine-grained error bound on differentially private continual observation","date_created":"2023-10-29T23:01:17Z","department":[{"_id":"MoHe"}],"article_processing_charge":"No","publication_status":"published","volume":202,"acknowledgement":"This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant agreement No.\r\n101019564 “The Design of Modern Fully Dynamic Data Structures (MoDynStruct)” and from the Austrian Science Fund (FWF) project Z 422-N, and project “Fast Algorithms for a Reactive Network Layer (ReactNet)”, P 33775-N, with additional funding from the netidee SCIENCE Stiftung, 2020–2024. 2020–2024. JU’s research was funded by Decanal Research Grant. A part of this work was done when JU was visiting Indian Statistical Institute, Delhi. The authors would like to thank Rajat Bhatia, Aleksandar Nikolov, Shanta Laisharam, Vern Paulsen, Ryan Rogers, Abhradeep Thakurta, and Sarvagya Upadhyay for useful discussions.","year":"2023","citation":{"ama":"Fichtenberger H, Henzinger MH, Upadhyay J. Constant matters: Fine-grained error bound on differentially private continual observation. In: <i>Proceedings of the 40th International Conference on Machine Learning</i>. Vol 202. ML Research Press; 2023:10072-10092.","apa":"Fichtenberger, H., Henzinger, M. H., &#38; Upadhyay, J. (2023). Constant matters: Fine-grained error bound on differentially private continual observation. In <i>Proceedings of the 40th International Conference on Machine Learning</i> (Vol. 202, pp. 10072–10092). Honolulu, Hawaii, HI, United States: ML Research Press.","chicago":"Fichtenberger, Hendrik, Monika H Henzinger, and Jalaj Upadhyay. “Constant Matters: Fine-Grained Error Bound on Differentially Private Continual Observation.” In <i>Proceedings of the 40th International Conference on Machine Learning</i>, 202:10072–92. ML Research Press, 2023.","ieee":"H. Fichtenberger, M. H. Henzinger, and J. Upadhyay, “Constant matters: Fine-grained error bound on differentially private continual observation,” in <i>Proceedings of the 40th International Conference on Machine Learning</i>, Honolulu, Hawaii, HI, United States, 2023, vol. 202, pp. 10072–10092.","short":"H. Fichtenberger, M.H. Henzinger, J. Upadhyay, in:, Proceedings of the 40th International Conference on Machine Learning, ML Research Press, 2023, pp. 10072–10092.","mla":"Fichtenberger, Hendrik, et al. “Constant Matters: Fine-Grained Error Bound on Differentially Private Continual Observation.” <i>Proceedings of the 40th International Conference on Machine Learning</i>, vol. 202, ML Research Press, 2023, pp. 10072–92.","ista":"Fichtenberger H, Henzinger MH, Upadhyay J. 2023. Constant matters: Fine-grained error bound on differentially private continual observation. Proceedings of the 40th International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 202, 10072–10092."},"date_updated":"2025-07-15T12:51:52Z","abstract":[{"text":"We study fine-grained error bounds for differentially private algorithms for counting under continual observation. Our main insight is that the matrix mechanism when using lower-triangular matrices can be used in the continual observation model. More specifically, we give an explicit factorization for the counting matrix Mcount and upper bound the error explicitly. We also give a fine-grained analysis, specifying the exact constant in the upper bound. Our analysis is based on upper and lower bounds of the completely bounded norm (cb-norm) of Mcount\r\n. Along the way, we improve the best-known bound of 28 years by Mathias (SIAM Journal on Matrix Analysis and Applications, 1993) on the cb-norm of Mcount for a large range of the dimension of Mcount. Furthermore, we are the first to give concrete error bounds for various problems under continual observation such as binary counting, maintaining a histogram, releasing an approximately cut-preserving synthetic graph, many graph-based statistics, and substring and episode counting. Finally, we note that our result can be used to get a fine-grained error bound for non-interactive local learning and the first lower bounds on the additive error for (ϵ,δ)-differentially-private counting under continual observation. Subsequent to this work, Henzinger et al. (SODA, 2023) showed that our factorization also achieves fine-grained mean-squared error.","lang":"eng"}],"day":"30"},{"abstract":[{"lang":"eng","text":"Brains are thought to engage in predictive learning - learning to predict upcoming stimuli - to construct an internal model of their environment. This is especially notable for spatial navigation, as first described by Tolman’s latent learning tasks. However, predictive learning has also been observed in sensory cortex, in settings unrelated to spatial navigation. Apart from normative frameworks such as active inference or efficient coding, what could be the utility of learning to predict the patterns of occurrence of correlated stimuli? Here we show that prediction, and thereby the construction of an internal model of sequential stimuli, can bootstrap the learning process of a working memory task in a recurrent neural network. We implemented predictive learning alongside working memory match-tasks, and networks emerged to solve the prediction task first by encoding information across time to predict upcoming stimuli, and then eavesdropped on this solution to solve the matching task. Eavesdropping was most beneficial when neural resources were limited. Hence, predictive learning acts as a general neural mechanism to learn to store sensory information that can later be essential for working memory tasks."}],"day":"01","date_updated":"2023-07-18T06:36:28Z","year":"2022","citation":{"ista":"Van Der Plas TL, Vogels TP, Manohar SG. 2022. Predictive learning enables neural networks to learn complex working memory tasks. Proceedings of Machine Learning Research. vol. 199, 518–531.","mla":"Van Der Plas, Thijs L., et al. “Predictive Learning Enables Neural Networks to Learn Complex Working Memory Tasks.” <i>Proceedings of Machine Learning Research</i>, vol. 199, ML Research Press, 2022, pp. 518–31.","short":"T.L. Van Der Plas, T.P. Vogels, S.G. Manohar, in:, Proceedings of Machine Learning Research, ML Research Press, 2022, pp. 518–531.","ieee":"T. L. Van Der Plas, T. P. Vogels, and S. G. Manohar, “Predictive learning enables neural networks to learn complex working memory tasks,” in <i>Proceedings of Machine Learning Research</i>, 2022, vol. 199, pp. 518–531.","chicago":"Van Der Plas, Thijs L., Tim P Vogels, and Sanjay G. Manohar. “Predictive Learning Enables Neural Networks to Learn Complex Working Memory Tasks.” In <i>Proceedings of Machine Learning Research</i>, 199:518–31. ML Research Press, 2022.","apa":"Van Der Plas, T. L., Vogels, T. P., &#38; Manohar, S. G. (2022). Predictive learning enables neural networks to learn complex working memory tasks. In <i>Proceedings of Machine Learning Research</i> (Vol. 199, pp. 518–531). ML Research Press.","ama":"Van Der Plas TL, Vogels TP, Manohar SG. Predictive learning enables neural networks to learn complex working memory tasks. In: <i>Proceedings of Machine Learning Research</i>. Vol 199. ML Research Press; 2022:518-531."},"ddc":["000"],"acknowledgement":"The authors would like to thank members of the Vogels lab and Manohar lab, as well as Adam Packer, Andrew Saxe, Stefano Sarao Mannelli and Jacob Bakermans for fruitful discussions and comments on earlier versions of the manuscript.\r\nTLvdP was supported by funding from the Biotechnology and Biological Sciences Research Council (BBSRC) [grant number BB/M011224/1]. TPV was supported by an ERC Consolidator Grant (SYNAPSEEK). SGM was funded by a MRC Clinician Scientist Fellowship MR/P00878X and Leverhulme Grant RPG-2018-310.","volume":199,"title":"Predictive learning enables neural networks to learn complex working memory tasks","intvolume":"       199","publication_status":"published","article_processing_charge":"No","department":[{"_id":"TiVo"}],"date_created":"2023-07-16T22:01:12Z","author":[{"full_name":"Van Der Plas, Thijs L.","last_name":"Van Der Plas","first_name":"Thijs L."},{"id":"CB6FF8D2-008F-11EA-8E08-2637E6697425","last_name":"Vogels","first_name":"Tim P","full_name":"Vogels, Tim P","orcid":"0000-0003-3295-6181"},{"full_name":"Manohar, Sanjay G.","first_name":"Sanjay G.","last_name":"Manohar"}],"_id":"13239","scopus_import":"1","publisher":"ML Research Press","file_date_updated":"2023-07-18T06:32:38Z","page":"518-531","quality_controlled":"1","ec_funded":1,"oa":1,"publication_identifier":{"eissn":["2640-3498"]},"date_published":"2022-12-01T00:00:00Z","type":"conference","status":"public","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","file":[{"checksum":"7530a93ef42e10b4db1e5e4b69796e93","file_size":585135,"date_created":"2023-07-18T06:32:38Z","content_type":"application/pdf","file_name":"2022_PMLR_vanderPlas.pdf","date_updated":"2023-07-18T06:32:38Z","relation":"main_file","access_level":"open_access","success":1,"creator":"dernst","file_id":"13243"}],"month":"12","oa_version":"Published Version","project":[{"name":"Learning the shape of synaptic plasticity rules for neuronal architectures and function through machine learning.","grant_number":"819603","_id":"0aacfa84-070f-11eb-9043-d7eb2c709234","call_identifier":"H2020"}],"publication":"Proceedings of Machine Learning Research","has_accepted_license":"1","language":[{"iso":"eng"}]},{"publication":"Proceedings of Machine Learning Research","month":"12","oa_version":"Preprint","language":[{"iso":"eng"}],"date_published":"2022-12-01T00:00:00Z","type":"conference","oa":1,"publication_identifier":{"eissn":["2640-3498"]},"related_material":{"record":[{"id":"10802","relation":"extended_version","status":"public"}]},"user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","status":"public","main_file_link":[{"open_access":"1","url":"https://arxiv.org/abs/2102.06004"}],"author":[{"id":"4B9D76E4-F248-11E8-B48F-1D18A9856A87","last_name":"Konstantinov","first_name":"Nikola H","full_name":"Konstantinov, Nikola H"},{"orcid":"0000-0001-8622-7887","full_name":"Lampert, Christoph","first_name":"Christoph","last_name":"Lampert","id":"40C20FD2-F248-11E8-B48F-1D18A9856A87"}],"_id":"13241","scopus_import":"1","title":"On the impossibility of fairness-aware learning from corrupted data","intvolume":"       171","publication_status":"published","date_created":"2023-07-16T22:01:13Z","department":[{"_id":"ChLa"}],"article_processing_charge":"No","page":"59-83","quality_controlled":"1","publisher":"ML Research Press","external_id":{"arxiv":["2102.06004"]},"date_updated":"2023-09-26T10:44:37Z","year":"2022","citation":{"ista":"Konstantinov NH, Lampert C. 2022. On the impossibility of fairness-aware learning from corrupted data. Proceedings of Machine Learning Research. vol. 171, 59–83.","mla":"Konstantinov, Nikola H., and Christoph Lampert. “On the Impossibility of Fairness-Aware Learning from Corrupted Data.” <i>Proceedings of Machine Learning Research</i>, vol. 171, ML Research Press, 2022, pp. 59–83.","short":"N.H. Konstantinov, C. Lampert, in:, Proceedings of Machine Learning Research, ML Research Press, 2022, pp. 59–83.","ieee":"N. H. Konstantinov and C. Lampert, “On the impossibility of fairness-aware learning from corrupted data,” in <i>Proceedings of Machine Learning Research</i>, 2022, vol. 171, pp. 59–83.","chicago":"Konstantinov, Nikola H, and Christoph Lampert. “On the Impossibility of Fairness-Aware Learning from Corrupted Data.” In <i>Proceedings of Machine Learning Research</i>, 171:59–83. ML Research Press, 2022.","ama":"Konstantinov NH, Lampert C. On the impossibility of fairness-aware learning from corrupted data. In: <i>Proceedings of Machine Learning Research</i>. Vol 171. ML Research Press; 2022:59-83.","apa":"Konstantinov, N. H., &#38; Lampert, C. (2022). On the impossibility of fairness-aware learning from corrupted data. In <i>Proceedings of Machine Learning Research</i> (Vol. 171, pp. 59–83). ML Research Press."},"abstract":[{"text":"Addressing fairness concerns about machine learning models is a crucial step towards their long-term adoption in real-world automated systems. Many approaches for training fair models from data have been developed and an implicit assumption about such algorithms is that they are able to recover a fair model, despite potential historical biases in the data. In this work we show a number of impossibility results that indicate that there is no learning algorithm that can recover a fair model when a proportion of the dataset is subject to arbitrary manipulations. Specifically, we prove that there are situations in which an adversary can force any learner to return a biased classifier, with or without degrading accuracy, and that the strength of this bias increases for learning problems with underrepresented protected groups in the data. Our results emphasize on the importance of studying further data corruption models of various strength and of establishing stricter data collection practices for fairness-aware learning.","lang":"eng"}],"arxiv":1,"day":"01","volume":171,"acknowledgement":"This paper is a shortened, workshop version of Konstantinov and Lampert (2021),\r\nhttps://arxiv.org/abs/2102.06004. For further results, including an analysis of algorithms achieving the lower bounds from this paper, we refer to the full version."},{"file_date_updated":"2023-06-19T10:49:12Z","page":"8119-8129","quality_controlled":"1","publisher":"ML Research Press","author":[{"full_name":"Nguyen, Quynh","first_name":"Quynh","last_name":"Nguyen"},{"id":"27EB676C-8706-11E9-9510-7717E6697425","orcid":"0000-0002-3242-7020","full_name":"Mondelli, Marco","first_name":"Marco","last_name":"Mondelli"},{"full_name":"Montufar, Guido","last_name":"Montufar","first_name":"Guido"}],"_id":"13146","scopus_import":"1","title":"Tight bounds on the smallest Eigenvalue of the neural tangent kernel for deep ReLU networks","intvolume":"       139","publication_status":"published","department":[{"_id":"MaMo"}],"date_created":"2023-06-18T22:00:48Z","article_processing_charge":"No","ddc":["000"],"acknowledgement":"The authors would like to thank the anonymous reviewers for their helpful comments. MM was partially supported by the 2019 Lopez-Loreta Prize. QN and GM acknowledge support from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement no 757983).","volume":139,"external_id":{"arxiv":["2012.11654"]},"date_updated":"2024-09-10T13:03:17Z","citation":{"ieee":"Q. Nguyen, M. Mondelli, and G. Montufar, “Tight bounds on the smallest Eigenvalue of the neural tangent kernel for deep ReLU networks,” in <i>Proceedings of the 38th International Conference on Machine Learning</i>, Virtual, 2021, vol. 139, pp. 8119–8129.","chicago":"Nguyen, Quynh, Marco Mondelli, and Guido Montufar. “Tight Bounds on the Smallest Eigenvalue of the Neural Tangent Kernel for Deep ReLU Networks.” In <i>Proceedings of the 38th International Conference on Machine Learning</i>, 139:8119–29. ML Research Press, 2021.","apa":"Nguyen, Q., Mondelli, M., &#38; Montufar, G. (2021). Tight bounds on the smallest Eigenvalue of the neural tangent kernel for deep ReLU networks. In <i>Proceedings of the 38th International Conference on Machine Learning</i> (Vol. 139, pp. 8119–8129). Virtual: ML Research Press.","ama":"Nguyen Q, Mondelli M, Montufar G. Tight bounds on the smallest Eigenvalue of the neural tangent kernel for deep ReLU networks. In: <i>Proceedings of the 38th International Conference on Machine Learning</i>. Vol 139. ML Research Press; 2021:8119-8129.","ista":"Nguyen Q, Mondelli M, Montufar G. 2021. Tight bounds on the smallest Eigenvalue of the neural tangent kernel for deep ReLU networks. Proceedings of the 38th International Conference on Machine Learning. International Conference on Machine Learning vol. 139, 8119–8129.","mla":"Nguyen, Quynh, et al. “Tight Bounds on the Smallest Eigenvalue of the Neural Tangent Kernel for Deep ReLU Networks.” <i>Proceedings of the 38th International Conference on Machine Learning</i>, vol. 139, ML Research Press, 2021, pp. 8119–29.","short":"Q. Nguyen, M. Mondelli, G. Montufar, in:, Proceedings of the 38th International Conference on Machine Learning, ML Research Press, 2021, pp. 8119–8129."},"year":"2021","abstract":[{"lang":"eng","text":"A recent line of work has analyzed the theoretical properties of deep neural networks via the Neural Tangent Kernel (NTK). In particular, the smallest eigenvalue of the NTK has been related to the memorization capacity, the global convergence of gradient descent algorithms and the generalization of deep nets. However, existing results either provide bounds in the two-layer setting or assume that the spectrum of the NTK matrices is bounded away from 0 for multi-layer networks. In this paper, we provide tight bounds on the smallest eigenvalue of NTK matrices for deep ReLU nets, both in the limiting case of infinite widths and for finite widths. In the finite-width setting, the network architectures we consider are fairly general: we require the existence of a wide layer with roughly order of N neurons, N being the number of data samples; and the scaling of the remaining layer widths is arbitrary (up to logarithmic factors). To obtain our results, we analyze various quantities of independent interest: we give lower bounds on the smallest singular value of hidden feature matrices, and upper bounds on the Lipschitz constant of input-output feature maps."}],"arxiv":1,"day":"01","language":[{"iso":"eng"}],"conference":{"location":"Virtual","end_date":"2021-07-24","start_date":"2021-07-18","name":"International Conference on Machine Learning"},"publication":"Proceedings of the 38th International Conference on Machine Learning","has_accepted_license":"1","month":"07","oa_version":"Published Version","project":[{"_id":"059876FA-7A3F-11EA-A408-12923DDC885E","name":"Prix Lopez-Loretta 2019 - Marco Mondelli"}],"user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","status":"public","file":[{"file_id":"13155","creator":"dernst","relation":"main_file","success":1,"access_level":"open_access","date_updated":"2023-06-19T10:49:12Z","file_name":"2021_PMLR_Nguyen.pdf","content_type":"application/pdf","date_created":"2023-06-19T10:49:12Z","file_size":591332,"checksum":"19489cf5e16a0596b1f92e317d97c9b0"}],"date_published":"2021-07-01T00:00:00Z","type":"conference","tmp":{"legal_code_url":"https://creativecommons.org/licenses/by/4.0/legalcode","short":"CC BY (4.0)","image":"/images/cc_by.png","name":"Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)"},"oa":1,"publication_identifier":{"eissn":["2640-3498"],"isbn":["9781713845065"]}},{"file_date_updated":"2023-06-19T10:41:05Z","quality_controlled":"1","ec_funded":1,"page":"196-206","publisher":"ML Research Press","author":[{"last_name":"Alimisis","first_name":"Foivos","full_name":"Alimisis, Foivos"},{"id":"11396234-BB50-11E9-B24C-90FCE5697425","first_name":"Peter","last_name":"Davies","orcid":"0000-0002-5646-9524","full_name":"Davies, Peter"},{"orcid":"0000-0003-3650-940X","full_name":"Alistarh, Dan-Adrian","first_name":"Dan-Adrian","last_name":"Alistarh","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87"}],"scopus_import":"1","_id":"13147","intvolume":"       139","title":"Communication-efficient distributed optimization with quantized preconditioners","department":[{"_id":"DaAl"}],"date_created":"2023-06-18T22:00:48Z","article_processing_charge":"No","publication_status":"published","ddc":["000"],"acknowledgement":"The authors would like to thank Janne Korhonen, Aurelien Lucchi, Celestine MendlerDunner and Antonio Orvieto for helpful discussions. FA ¨and DA were supported during this work by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 805223 ScaleML). PD was supported by the European Union’s Horizon 2020 programme under the Marie Skłodowska-Curie grant agreement No. 754411.","volume":139,"external_id":{"arxiv":["2102.07214"]},"year":"2021","citation":{"chicago":"Alimisis, Foivos, Peter Davies, and Dan-Adrian Alistarh. “Communication-Efficient Distributed Optimization with Quantized Preconditioners.” In <i>Proceedings of the 38th International Conference on Machine Learning</i>, 139:196–206. ML Research Press, 2021.","ieee":"F. Alimisis, P. Davies, and D.-A. Alistarh, “Communication-efficient distributed optimization with quantized preconditioners,” in <i>Proceedings of the 38th International Conference on Machine Learning</i>, Virtual, 2021, vol. 139, pp. 196–206.","apa":"Alimisis, F., Davies, P., &#38; Alistarh, D.-A. (2021). Communication-efficient distributed optimization with quantized preconditioners. In <i>Proceedings of the 38th International Conference on Machine Learning</i> (Vol. 139, pp. 196–206). Virtual: ML Research Press.","ama":"Alimisis F, Davies P, Alistarh D-A. Communication-efficient distributed optimization with quantized preconditioners. In: <i>Proceedings of the 38th International Conference on Machine Learning</i>. Vol 139. ML Research Press; 2021:196-206.","ista":"Alimisis F, Davies P, Alistarh D-A. 2021. Communication-efficient distributed optimization with quantized preconditioners. Proceedings of the 38th International Conference on Machine Learning. International Conference on Machine Learning vol. 139, 196–206.","short":"F. Alimisis, P. Davies, D.-A. Alistarh, in:, Proceedings of the 38th International Conference on Machine Learning, ML Research Press, 2021, pp. 196–206.","mla":"Alimisis, Foivos, et al. “Communication-Efficient Distributed Optimization with Quantized Preconditioners.” <i>Proceedings of the 38th International Conference on Machine Learning</i>, vol. 139, ML Research Press, 2021, pp. 196–206."},"date_updated":"2023-06-19T10:44:38Z","abstract":[{"text":"We investigate fast and communication-efficient algorithms for the classic problem of minimizing a sum of strongly convex and smooth functions that are distributed among n\r\n different nodes, which can communicate using a limited number of bits. Most previous communication-efficient approaches for this problem are limited to first-order optimization, and therefore have \\emph{linear} dependence on the condition number in their communication complexity. We show that this dependence is not inherent: communication-efficient methods can in fact have sublinear dependence on the condition number. For this, we design and analyze the first communication-efficient distributed variants of preconditioned gradient descent for Generalized Linear Models, and for Newton’s method. Our results rely on a new technique for quantizing both the preconditioner and the descent direction at each step of the algorithms, while controlling their convergence rate. We also validate our findings experimentally, showing faster convergence and reduced communication relative to previous methods.","lang":"eng"}],"day":"01","arxiv":1,"language":[{"iso":"eng"}],"conference":{"name":"International Conference on Machine Learning","start_date":"2021-07-18","end_date":"2021-07-24","location":"Virtual"},"has_accepted_license":"1","publication":"Proceedings of the 38th International Conference on Machine Learning","month":"07","project":[{"_id":"268A44D6-B435-11E9-9278-68D0E5697425","call_identifier":"H2020","name":"Elastic Coordination for Scalable Machine Learning","grant_number":"805223"},{"name":"ISTplus - Postdoctoral Fellowships","grant_number":"754411","call_identifier":"H2020","_id":"260C2330-B435-11E9-9278-68D0E5697425"}],"oa_version":"Published Version","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","status":"public","file":[{"access_level":"open_access","success":1,"relation":"main_file","creator":"dernst","file_id":"13154","checksum":"7ec0d59bac268b49c76bf2e036dedd7a","file_size":429087,"date_created":"2023-06-19T10:41:05Z","file_name":"2021_PMLR_Alimisis.pdf","content_type":"application/pdf","date_updated":"2023-06-19T10:41:05Z"}],"type":"conference","date_published":"2021-07-01T00:00:00Z","tmp":{"legal_code_url":"https://creativecommons.org/licenses/by/4.0/legalcode","short":"CC BY (4.0)","image":"/images/cc_by.png","name":"Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)"},"oa":1,"publication_identifier":{"isbn":["9781713845065"],"eissn":["2640-3498"]}},{"extern":"1","volume":70,"external_id":{"arxiv":["1706.05826"]},"date_updated":"2023-02-09T09:15:31Z","year":"2017","citation":{"mla":"Wang, Di, et al. “Capacity Releasing Diffusion for Speed and Locality.” <i>Proceedings of the 34th International Conference on Machine Learning</i>, vol. 70, ML Research Press, 2017, pp. 3598–607.","short":"D. Wang, K. Fountoulakis, M.H. Henzinger, M.W. Mahoney,  Satish Rao , in:, Proceedings of the 34th International Conference on Machine Learning, ML Research Press, 2017, pp. 3598–3607.","ista":"Wang D, Fountoulakis K, Henzinger MH, Mahoney MW, Rao   Satish. 2017. Capacity releasing diffusion for speed and locality. Proceedings of the 34th International Conference on Machine Learning. International Conference on Machine Learning, PMLR, vol. 70, 3598–3607.","apa":"Wang, D., Fountoulakis, K., Henzinger, M. H., Mahoney, M. W., &#38; Rao ,  Satish. (2017). Capacity releasing diffusion for speed and locality. In <i>Proceedings of the 34th International Conference on Machine Learning</i> (Vol. 70, pp. 3598–3607). Sydney, Australia: ML Research Press.","ama":"Wang D, Fountoulakis K, Henzinger MH, Mahoney MW, Rao   Satish. Capacity releasing diffusion for speed and locality. In: <i>Proceedings of the 34th International Conference on Machine Learning</i>. Vol 70. ML Research Press; 2017:3598-3607.","chicago":"Wang, Di, Kimon Fountoulakis, Monika H Henzinger, Michael W. Mahoney, and  Satish Rao . “Capacity Releasing Diffusion for Speed and Locality.” In <i>Proceedings of the 34th International Conference on Machine Learning</i>, 70:3598–3607. ML Research Press, 2017.","ieee":"D. Wang, K. Fountoulakis, M. H. Henzinger, M. W. Mahoney, and  Satish Rao , “Capacity releasing diffusion for speed and locality,” in <i>Proceedings of the 34th International Conference on Machine Learning</i>, Sydney, Australia, 2017, vol. 70, pp. 3598–3607."},"abstract":[{"text":"Diffusions and related random walk procedures are of central importance in many areas of machine learning, data analysis, and applied mathematics. Because they spread mass agnostically at each step in an iterative manner, they can sometimes spread mass “too aggressively,” thereby failing to find the “right” clusters. We introduce a novel Capacity Releasing Diffusion (CRD) Process, which is both faster and stays more local than the classical spectral diffusion process. As an application, we use our CRD Process to develop an improved local algorithm for graph clustering. Our local graph clustering method can find local clusters in a model of clustering where one begins the CRD Process in a cluster whose vertices are connected better internally than externally by an O(log2n) factor, where n is the number of nodes in the cluster. Thus, our CRD Process is the first local graph clustering algorithm that is not subject to the well-known quadratic Cheeger barrier. Our result requires a certain smoothness condition, which we expect to be an artifact of our analysis. Our empirical evaluation demonstrates improved results, in particular for realistic social graphs where there are moderately good—but not very good—clusters.","lang":"eng"}],"arxiv":1,"day":"01","page":"3598-3607","quality_controlled":"1","publisher":"ML Research Press","author":[{"full_name":"Wang, Di","first_name":"Di","last_name":"Wang"},{"full_name":"Fountoulakis, Kimon","last_name":"Fountoulakis","first_name":"Kimon"},{"full_name":"Henzinger, Monika H","orcid":"0000-0002-5008-6530","last_name":"Henzinger","first_name":"Monika H","id":"540c9bbd-f2de-11ec-812d-d04a5be85630"},{"first_name":"Michael W.","last_name":"Mahoney","full_name":"Mahoney, Michael W."},{"first_name":" Satish","last_name":"Rao ","full_name":"Rao ,  Satish"}],"_id":"11651","title":"Capacity releasing diffusion for speed and locality","alternative_title":["PMLR"],"intvolume":"        70","publication_status":"published","date_created":"2022-07-25T13:59:21Z","article_processing_charge":"No","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","status":"public","main_file_link":[{"url":"http://proceedings.mlr.press/v70/wang17b/wang17b.pdf","open_access":"1"}],"date_published":"2017-09-01T00:00:00Z","type":"conference","oa":1,"publication_identifier":{"eissn":["2640-3498"]},"language":[{"iso":"eng"}],"conference":{"start_date":"2017-08-06","name":"International Conference on Machine Learning","end_date":"2017-08-11","location":"Sydney, Australia"},"publication":"Proceedings of the 34th International Conference on Machine Learning","month":"09","oa_version":"Published Version"}]