[{"ec_funded":1,"related_material":{"record":[{"relation":"part_of_dissertation","id":"11458","status":"public"},{"relation":"part_of_dissertation","id":"13053","status":"public"},{"relation":"part_of_dissertation","id":"12299","status":"public"}]},"degree_awarded":"PhD","year":"2023","date_created":"2023-05-23T17:07:53Z","citation":{"apa":"Peste, E.-A. (2023). <i>Efficiency and generalization of sparse neural networks</i>. Institute of Science and Technology Austria. <a href=\"https://doi.org/10.15479/at:ista:13074\">https://doi.org/10.15479/at:ista:13074</a>","ista":"Peste E-A. 2023. Efficiency and generalization of sparse neural networks. Institute of Science and Technology Austria.","chicago":"Peste, Elena-Alexandra. “Efficiency and Generalization of Sparse Neural Networks.” Institute of Science and Technology Austria, 2023. <a href=\"https://doi.org/10.15479/at:ista:13074\">https://doi.org/10.15479/at:ista:13074</a>.","mla":"Peste, Elena-Alexandra. <i>Efficiency and Generalization of Sparse Neural Networks</i>. Institute of Science and Technology Austria, 2023, doi:<a href=\"https://doi.org/10.15479/at:ista:13074\">10.15479/at:ista:13074</a>.","ieee":"E.-A. Peste, “Efficiency and generalization of sparse neural networks,” Institute of Science and Technology Austria, 2023.","ama":"Peste E-A. Efficiency and generalization of sparse neural networks. 2023. doi:<a href=\"https://doi.org/10.15479/at:ista:13074\">10.15479/at:ista:13074</a>","short":"E.-A. Peste, Efficiency and Generalization of Sparse Neural Networks, Institute of Science and Technology Austria, 2023."},"doi":"10.15479/at:ista:13074","file_date_updated":"2023-05-24T16:12:59Z","_id":"13074","acknowledged_ssus":[{"_id":"ScienComp"}],"ddc":["000"],"abstract":[{"text":"Deep learning has become an integral part of a large number of important applications, and many of the recent breakthroughs have been enabled by the ability to train very large models, capable to capture complex patterns and relationships from the data. At the same time, the massive sizes of modern deep learning models have made their deployment to smaller devices more challenging; this is particularly important, as in many applications the users rely on accurate deep learning predictions, but they only have access to devices with limited memory and compute power. One solution to this problem is to prune neural networks, by setting as many of their parameters as possible to zero, to obtain accurate sparse models with lower memory footprint. Despite the great research progress in obtaining sparse models that preserve accuracy, while satisfying memory and computational constraints, there are still many challenges associated with efficiently training sparse models, as well as understanding their generalization properties.\r\n\r\nThe focus of this thesis is to investigate how the training process of sparse models can be made more efficient, and to understand the differences between sparse and dense models in terms of how well they can generalize to changes in the data distribution. We first study a method for co-training sparse and dense models, at a lower cost compared to regular training. With our method we can obtain very accurate sparse networks, and dense models that can recover the baseline accuracy. Furthermore, we are able to more easily analyze the differences, at prediction level, between the sparse-dense model pairs. Next, we investigate the generalization properties of sparse neural networks in more detail, by studying how well different sparse models trained on a larger task can adapt to smaller, more specialized tasks, in a transfer learning scenario. Our analysis across multiple pruning methods and sparsity levels reveals that sparse models provide features that can transfer similarly to or better than the dense baseline. However, the choice of the pruning method plays an important role, and can influence the results when the features are fixed (linear finetuning), or when they are allowed to adapt to the new task (full finetuning). Using sparse models with fixed masks for finetuning on new tasks has an important practical advantage, as it enables training neural networks on smaller devices. However, one drawback of current pruning methods is that the entire training cycle has to be repeated to obtain the initial sparse model, for every sparsity target; in consequence, the entire training process is costly and also multiple models need to be stored. In the last part of the thesis we propose a method that can train accurate dense models that are compressible in a single step, to multiple sparsity levels, without additional finetuning. Our method results in sparse models that can be competitive with existing pruning methods, and which can also successfully generalize to new tasks.","lang":"eng"}],"publication_status":"published","publisher":"Institute of Science and Technology Austria","date_published":"2023-05-23T00:00:00Z","supervisor":[{"first_name":"Christoph","full_name":"Lampert, Christoph","orcid":"0000-0001-8622-7887","id":"40C20FD2-F248-11E8-B48F-1D18A9856A87","last_name":"Lampert"},{"first_name":"Dan-Adrian","full_name":"Alistarh, Dan-Adrian","last_name":"Alistarh","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X"}],"page":"147","department":[{"_id":"GradSch"},{"_id":"DaAl"},{"_id":"ChLa"}],"project":[{"name":"International IST Doctoral Program","call_identifier":"H2020","grant_number":"665385","_id":"2564DBCA-B435-11E9-9278-68D0E5697425"},{"grant_number":"805223","_id":"268A44D6-B435-11E9-9278-68D0E5697425","name":"Elastic Coordination for Scalable Machine Learning","call_identifier":"H2020"}],"has_accepted_license":"1","oa":1,"date_updated":"2023-08-04T10:33:27Z","article_processing_charge":"No","title":"Efficiency and generalization of sparse neural networks","language":[{"iso":"eng"}],"publication_identifier":{"issn":["2663-337X"]},"month":"05","oa_version":"Published Version","day":"23","file":[{"date_created":"2023-05-24T16:11:16Z","date_updated":"2023-05-24T16:11:16Z","success":1,"file_name":"PhD_Thesis_Alexandra_Peste_final.pdf","file_id":"13087","checksum":"6b3354968403cb9d48cc5a83611fb571","file_size":2152072,"access_level":"open_access","relation":"main_file","content_type":"application/pdf","creator":"epeste"},{"date_created":"2023-05-24T16:12:59Z","date_updated":"2023-05-24T16:12:59Z","checksum":"8d0df94bbcf4db72c991f22503b3fd60","file_name":"PhD_Thesis_APeste.zip","file_id":"13088","content_type":"application/zip","file_size":1658293,"access_level":"closed","relation":"source_file","creator":"epeste"}],"user_id":"8b945eb4-e2f2-11eb-945a-df72226e66a9","author":[{"full_name":"Peste, Elena-Alexandra","first_name":"Elena-Alexandra","id":"32D78294-F248-11E8-B48F-1D18A9856A87","last_name":"Peste"}],"alternative_title":["ISTA Thesis"],"type":"dissertation","status":"public"},{"file_date_updated":"2022-03-28T12:55:12Z","_id":"10429","doi":"10.15479/at:ista:10429","citation":{"chicago":"Nadiradze, Giorgi. “On Achieving Scalability through Relaxation.” Institute of Science and Technology Austria, 2021. <a href=\"https://doi.org/10.15479/at:ista:10429\">https://doi.org/10.15479/at:ista:10429</a>.","mla":"Nadiradze, Giorgi. <i>On Achieving Scalability through Relaxation</i>. Institute of Science and Technology Austria, 2021, doi:<a href=\"https://doi.org/10.15479/at:ista:10429\">10.15479/at:ista:10429</a>.","ista":"Nadiradze G. 2021. On achieving scalability through relaxation. Institute of Science and Technology Austria.","ama":"Nadiradze G. On achieving scalability through relaxation. 2021. doi:<a href=\"https://doi.org/10.15479/at:ista:10429\">10.15479/at:ista:10429</a>","ieee":"G. Nadiradze, “On achieving scalability through relaxation,” Institute of Science and Technology Austria, 2021.","short":"G. Nadiradze, On Achieving Scalability through Relaxation, Institute of Science and Technology Austria, 2021.","apa":"Nadiradze, G. (2021). <i>On achieving scalability through relaxation</i>. Institute of Science and Technology Austria. <a href=\"https://doi.org/10.15479/at:ista:10429\">https://doi.org/10.15479/at:ista:10429</a>"},"date_created":"2021-12-08T21:52:28Z","year":"2021","ec_funded":1,"related_material":{"record":[{"status":"public","relation":"part_of_dissertation","id":"10432"},{"status":"public","id":"6673","relation":"part_of_dissertation"},{"status":"public","id":"5965","relation":"part_of_dissertation"},{"relation":"part_of_dissertation","id":"10435","status":"public"}]},"degree_awarded":"PhD","project":[{"_id":"268A44D6-B435-11E9-9278-68D0E5697425","grant_number":"805223","name":"Elastic Coordination for Scalable Machine Learning","call_identifier":"H2020"}],"page":"132","department":[{"_id":"GradSch"},{"_id":"DaAl"}],"publisher":"Institute of Science and Technology Austria","supervisor":[{"full_name":"Alistarh, Dan-Adrian","first_name":"Dan-Adrian","last_name":"Alistarh","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X"}],"date_published":"2021-12-09T00:00:00Z","publication_status":"published","abstract":[{"lang":"eng","text":"The scalability of concurrent data structures and distributed algorithms strongly depends on\r\nreducing the contention for shared resources and the costs of synchronization and communication. We show how such cost reductions can be attained by relaxing the strict consistency conditions required by sequential implementations. In the first part of the thesis, we consider relaxation in the context of concurrent data structures. Specifically, in data structures \r\nsuch as priority queues, imposing strong semantics renders scalability impossible, since a correct implementation of the remove operation should return only the element with highest priority. Intuitively, attempting to invoke remove operations concurrently  creates a race condition. This bottleneck  can be circumvented by relaxing semantics of the affected data structure, thus allowing removal of the elements which are no longer required to have the highest priority. We prove that the randomized implementations of relaxed data structures provide provable guarantees on the priority of the removed elements even under concurrency. Additionally, we show that in some cases the relaxed data structures can be used to scale the classical algorithms which are usually implemented with the exact ones. In the second part, we study parallel variants of the  stochastic gradient descent (SGD) algorithm, which distribute computation  among the multiple processors, thus reducing the running time. Unfortunately, in order for standard parallel SGD to succeed, each processor has to maintain a local copy of the necessary model parameter, which is identical to the local copies of other processors; the overheads from this perfect consistency in terms of communication and synchronization can negate the speedup gained by distributing the computation. We show that the consistency conditions required by SGD can be  relaxed, allowing the algorithm to be more flexible in terms of tolerating quantized communication, asynchrony, or even crash faults, while its convergence remains asymptotically the same."}],"ddc":["000"],"language":[{"iso":"eng"}],"title":"On achieving scalability through relaxation","oa":1,"date_updated":"2023-10-17T11:48:55Z","article_processing_charge":"No","has_accepted_license":"1","alternative_title":["ISTA Thesis"],"type":"dissertation","status":"public","author":[{"full_name":"Nadiradze, Giorgi","first_name":"Giorgi","last_name":"Nadiradze","id":"3279A00C-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0001-5634-0731"}],"day":"09","oa_version":"Published Version","file":[{"success":1,"file_id":"10436","file_name":"Thesis_Final_09_12_2021.pdf","checksum":"6bf14e9a523387328f016c0689f5e10e","date_created":"2021-12-09T17:47:49Z","date_updated":"2021-12-09T17:47:49Z","creator":"gnadirad","file_size":2370859,"access_level":"open_access","content_type":"application/pdf","relation":"main_file"},{"date_created":"2021-12-09T17:47:49Z","date_updated":"2022-03-28T12:55:12Z","file_id":"10437","file_name":"Thesis_Final_09_12_2021.zip","checksum":"914d6c5ca86bd0add471971a8f4c4341","file_size":2596924,"relation":"source_file","access_level":"closed","content_type":"application/zip","creator":"gnadirad"}],"user_id":"c635000d-4b10-11ee-a964-aac5a93f6ac1","publication_identifier":{"issn":["2663-337X"]},"month":"12"}]
