---
_id: '13074'
abstract:
- lang: eng
  text: "Deep learning has become an integral part of a large number of important
    applications, and many of the recent breakthroughs have been enabled by the ability
    to train very large models, capable to capture complex patterns and relationships
    from the data. At the same time, the massive sizes of modern deep learning models
    have made their deployment to smaller devices more challenging; this is particularly
    important, as in many applications the users rely on accurate deep learning predictions,
    but they only have access to devices with limited memory and compute power. One
    solution to this problem is to prune neural networks, by setting as many of their
    parameters as possible to zero, to obtain accurate sparse models with lower memory
    footprint. Despite the great research progress in obtaining sparse models that
    preserve accuracy, while satisfying memory and computational constraints, there
    are still many challenges associated with efficiently training sparse models,
    as well as understanding their generalization properties.\r\n\r\nThe focus of
    this thesis is to investigate how the training process of sparse models can be
    made more efficient, and to understand the differences between sparse and dense
    models in terms of how well they can generalize to changes in the data distribution.
    We first study a method for co-training sparse and dense models, at a lower cost
    compared to regular training. With our method we can obtain very accurate sparse
    networks, and dense models that can recover the baseline accuracy. Furthermore,
    we are able to more easily analyze the differences, at prediction level, between
    the sparse-dense model pairs. Next, we investigate the generalization properties
    of sparse neural networks in more detail, by studying how well different sparse
    models trained on a larger task can adapt to smaller, more specialized tasks,
    in a transfer learning scenario. Our analysis across multiple pruning methods
    and sparsity levels reveals that sparse models provide features that can transfer
    similarly to or better than the dense baseline. However, the choice of the pruning
    method plays an important role, and can influence the results when the features
    are fixed (linear finetuning), or when they are allowed to adapt to the new task
    (full finetuning). Using sparse models with fixed masks for finetuning on new
    tasks has an important practical advantage, as it enables training neural networks
    on smaller devices. However, one drawback of current pruning methods is that the
    entire training cycle has to be repeated to obtain the initial sparse model, for
    every sparsity target; in consequence, the entire training process is costly and
    also multiple models need to be stored. In the last part of the thesis we propose
    a method that can train accurate dense models that are compressible in a single
    step, to multiple sparsity levels, without additional finetuning. Our method results
    in sparse models that can be competitive with existing pruning methods, and which
    can also successfully generalize to new tasks."
acknowledged_ssus:
- _id: ScienComp
alternative_title:
- ISTA Thesis
article_processing_charge: No
author:
- first_name: Elena-Alexandra
  full_name: Peste, Elena-Alexandra
  id: 32D78294-F248-11E8-B48F-1D18A9856A87
  last_name: Peste
citation:
  ama: Peste E-A. Efficiency and generalization of sparse neural networks. 2023. doi:<a
    href="https://doi.org/10.15479/at:ista:13074">10.15479/at:ista:13074</a>
  apa: Peste, E.-A. (2023). <i>Efficiency and generalization of sparse neural networks</i>.
    Institute of Science and Technology Austria. <a href="https://doi.org/10.15479/at:ista:13074">https://doi.org/10.15479/at:ista:13074</a>
  chicago: Peste, Elena-Alexandra. “Efficiency and Generalization of Sparse Neural
    Networks.” Institute of Science and Technology Austria, 2023. <a href="https://doi.org/10.15479/at:ista:13074">https://doi.org/10.15479/at:ista:13074</a>.
  ieee: E.-A. Peste, “Efficiency and generalization of sparse neural networks,” Institute
    of Science and Technology Austria, 2023.
  ista: Peste E-A. 2023. Efficiency and generalization of sparse neural networks.
    Institute of Science and Technology Austria.
  mla: Peste, Elena-Alexandra. <i>Efficiency and Generalization of Sparse Neural Networks</i>.
    Institute of Science and Technology Austria, 2023, doi:<a href="https://doi.org/10.15479/at:ista:13074">10.15479/at:ista:13074</a>.
  short: E.-A. Peste, Efficiency and Generalization of Sparse Neural Networks, Institute
    of Science and Technology Austria, 2023.
date_created: 2023-05-23T17:07:53Z
date_published: 2023-05-23T00:00:00Z
date_updated: 2023-08-04T10:33:27Z
day: '23'
ddc:
- '000'
degree_awarded: PhD
department:
- _id: GradSch
- _id: DaAl
- _id: ChLa
doi: 10.15479/at:ista:13074
ec_funded: 1
file:
- access_level: open_access
  checksum: 6b3354968403cb9d48cc5a83611fb571
  content_type: application/pdf
  creator: epeste
  date_created: 2023-05-24T16:11:16Z
  date_updated: 2023-05-24T16:11:16Z
  file_id: '13087'
  file_name: PhD_Thesis_Alexandra_Peste_final.pdf
  file_size: 2152072
  relation: main_file
  success: 1
- access_level: closed
  checksum: 8d0df94bbcf4db72c991f22503b3fd60
  content_type: application/zip
  creator: epeste
  date_created: 2023-05-24T16:12:59Z
  date_updated: 2023-05-24T16:12:59Z
  file_id: '13088'
  file_name: PhD_Thesis_APeste.zip
  file_size: 1658293
  relation: source_file
file_date_updated: 2023-05-24T16:12:59Z
has_accepted_license: '1'
language:
- iso: eng
month: '05'
oa: 1
oa_version: Published Version
page: '147'
project:
- _id: 2564DBCA-B435-11E9-9278-68D0E5697425
  call_identifier: H2020
  grant_number: '665385'
  name: International IST Doctoral Program
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
  call_identifier: H2020
  grant_number: '805223'
  name: Elastic Coordination for Scalable Machine Learning
publication_identifier:
  issn:
  - 2663-337X
publication_status: published
publisher: Institute of Science and Technology Austria
related_material:
  record:
  - id: '11458'
    relation: part_of_dissertation
    status: public
  - id: '13053'
    relation: part_of_dissertation
    status: public
  - id: '12299'
    relation: part_of_dissertation
    status: public
status: public
supervisor:
- first_name: Christoph
  full_name: Lampert, Christoph
  id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
  last_name: Lampert
  orcid: 0000-0001-8622-7887
- first_name: Dan-Adrian
  full_name: Alistarh, Dan-Adrian
  id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
  last_name: Alistarh
  orcid: 0000-0003-3650-940X
title: Efficiency and generalization of sparse neural networks
type: dissertation
user_id: 8b945eb4-e2f2-11eb-945a-df72226e66a9
year: '2023'
...
---
_id: '10429'
abstract:
- lang: eng
  text: "The scalability of concurrent data structures and distributed algorithms
    strongly depends on\r\nreducing the contention for shared resources and the costs
    of synchronization and communication. We show how such cost reductions can be
    attained by relaxing the strict consistency conditions required by sequential
    implementations. In the first part of the thesis, we consider relaxation in the
    context of concurrent data structures. Specifically, in data structures \r\nsuch
    as priority queues, imposing strong semantics renders scalability impossible,
    since a correct implementation of the remove operation should return only the
    element with highest priority. Intuitively, attempting to invoke remove operations
    concurrently  creates a race condition. This bottleneck  can be circumvented by
    relaxing semantics of the affected data structure, thus allowing removal of the
    elements which are no longer required to have the highest priority. We prove that
    the randomized implementations of relaxed data structures provide provable guarantees
    on the priority of the removed elements even under concurrency. Additionally,
    we show that in some cases the relaxed data structures can be used to scale the
    classical algorithms which are usually implemented with the exact ones. In the
    second part, we study parallel variants of the  stochastic gradient descent (SGD)
    algorithm, which distribute computation  among the multiple processors, thus reducing
    the running time. Unfortunately, in order for standard parallel SGD to succeed,
    each processor has to maintain a local copy of the necessary model parameter,
    which is identical to the local copies of other processors; the overheads from
    this perfect consistency in terms of communication and synchronization can negate
    the speedup gained by distributing the computation. We show that the consistency
    conditions required by SGD can be  relaxed, allowing the algorithm to be more
    flexible in terms of tolerating quantized communication, asynchrony, or even crash
    faults, while its convergence remains asymptotically the same."
alternative_title:
- ISTA Thesis
article_processing_charge: No
author:
- first_name: Giorgi
  full_name: Nadiradze, Giorgi
  id: 3279A00C-F248-11E8-B48F-1D18A9856A87
  last_name: Nadiradze
  orcid: 0000-0001-5634-0731
citation:
  ama: Nadiradze G. On achieving scalability through relaxation. 2021. doi:<a href="https://doi.org/10.15479/at:ista:10429">10.15479/at:ista:10429</a>
  apa: Nadiradze, G. (2021). <i>On achieving scalability through relaxation</i>. Institute
    of Science and Technology Austria. <a href="https://doi.org/10.15479/at:ista:10429">https://doi.org/10.15479/at:ista:10429</a>
  chicago: Nadiradze, Giorgi. “On Achieving Scalability through Relaxation.” Institute
    of Science and Technology Austria, 2021. <a href="https://doi.org/10.15479/at:ista:10429">https://doi.org/10.15479/at:ista:10429</a>.
  ieee: G. Nadiradze, “On achieving scalability through relaxation,” Institute of
    Science and Technology Austria, 2021.
  ista: Nadiradze G. 2021. On achieving scalability through relaxation. Institute
    of Science and Technology Austria.
  mla: Nadiradze, Giorgi. <i>On Achieving Scalability through Relaxation</i>. Institute
    of Science and Technology Austria, 2021, doi:<a href="https://doi.org/10.15479/at:ista:10429">10.15479/at:ista:10429</a>.
  short: G. Nadiradze, On Achieving Scalability through Relaxation, Institute of Science
    and Technology Austria, 2021.
date_created: 2021-12-08T21:52:28Z
date_published: 2021-12-09T00:00:00Z
date_updated: 2023-10-17T11:48:55Z
day: '09'
ddc:
- '000'
degree_awarded: PhD
department:
- _id: GradSch
- _id: DaAl
doi: 10.15479/at:ista:10429
ec_funded: 1
file:
- access_level: open_access
  checksum: 6bf14e9a523387328f016c0689f5e10e
  content_type: application/pdf
  creator: gnadirad
  date_created: 2021-12-09T17:47:49Z
  date_updated: 2021-12-09T17:47:49Z
  file_id: '10436'
  file_name: Thesis_Final_09_12_2021.pdf
  file_size: 2370859
  relation: main_file
  success: 1
- access_level: closed
  checksum: 914d6c5ca86bd0add471971a8f4c4341
  content_type: application/zip
  creator: gnadirad
  date_created: 2021-12-09T17:47:49Z
  date_updated: 2022-03-28T12:55:12Z
  file_id: '10437'
  file_name: Thesis_Final_09_12_2021.zip
  file_size: 2596924
  relation: source_file
file_date_updated: 2022-03-28T12:55:12Z
has_accepted_license: '1'
language:
- iso: eng
month: '12'
oa: 1
oa_version: Published Version
page: '132'
project:
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
  call_identifier: H2020
  grant_number: '805223'
  name: Elastic Coordination for Scalable Machine Learning
publication_identifier:
  issn:
  - 2663-337X
publication_status: published
publisher: Institute of Science and Technology Austria
related_material:
  record:
  - id: '10432'
    relation: part_of_dissertation
    status: public
  - id: '6673'
    relation: part_of_dissertation
    status: public
  - id: '5965'
    relation: part_of_dissertation
    status: public
  - id: '10435'
    relation: part_of_dissertation
    status: public
status: public
supervisor:
- first_name: Dan-Adrian
  full_name: Alistarh, Dan-Adrian
  id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
  last_name: Alistarh
  orcid: 0000-0003-3650-940X
title: On achieving scalability through relaxation
type: dissertation
user_id: c635000d-4b10-11ee-a964-aac5a93f6ac1
year: '2021'
...