Leveraging structure in Computer Vision tasks for flexible Deep Learning models

Royer, Amélie

Leveraging structure in Computer Vision tasks for flexible Deep Learning models

Royer A. 2020. Leveraging structure in Computer Vision tasks for flexible Deep Learning models. Institute of Science and Technology Austria.

Download

2020_Thesis_Royer.pdf 30.22 MB [Published Version] Restricted

thesis_sources.zip

DOI

10.15479/AT:ISTA:8390

Thesis | PhD | Published | English

Author

Royer, Amélie^ISTA

Supervisor

Lampert, Christoph^ISTA

Department

Lampert Group

Series Title

ISTA Thesis

Abstract

Deep neural networks have established a new standard for data-dependent feature extraction pipelines in the Computer Vision literature. Despite their remarkable performance in the standard supervised learning scenario, i.e. when models are trained with labeled data and tested on samples that follow a similar distribution, neural networks have been shown to struggle with more advanced generalization abilities, such as transferring knowledge across visually different domains, or generalizing to new unseen combinations of known concepts. In this thesis we argue that, in contrast to the usual black-box behavior of neural networks, leveraging more structured internal representations is a promising direction for tackling such problems. In particular, we focus on two forms of structure. First, we tackle modularity: We show that (i) compositional architectures are a natural tool for modeling reasoning tasks, in that they efficiently capture their combinatorial nature, which is key for generalizing beyond the compositions seen during training. We investigate how to to learn such models, both formally and experimentally, for the task of abstract visual reasoning. Then, we show that (ii) in some settings, modularity allows us to efficiently break down complex tasks into smaller, easier, modules, thereby improving computational efficiency; We study this behavior in the context of generative models for colorization, as well as for small objects detection. Secondly, we investigate the inherently layered structure of representations learned by neural networks, and analyze its role in the context of transfer learning and domain adaptation across visually dissimilar domains.

Publishing Year

2020

Date Published

2020-09-14

Publisher

Institute of Science and Technology Austria

Acknowledgement

Last but not least, I would like to acknowledge the support of the IST IT and scientific computing team for helping provide a great work environment.

Acknowledged SSUs

Campus IT
Scientific Computing

Page

197

ISBN

978-3-99078-007-7

ISSN

2663-337X

IST-REx-ID

8390

Cite this

Royer A. Leveraging structure in Computer Vision tasks for flexible Deep Learning models. 2020. doi:10.15479/AT:ISTA:8390

Royer, A. (2020). Leveraging structure in Computer Vision tasks for flexible Deep Learning models. Institute of Science and Technology Austria. https://doi.org/10.15479/AT:ISTA:8390

Royer, Amélie. “Leveraging Structure in Computer Vision Tasks for Flexible Deep Learning Models.” Institute of Science and Technology Austria, 2020. https://doi.org/10.15479/AT:ISTA:8390.

A. Royer, “Leveraging structure in Computer Vision tasks for flexible Deep Learning models,” Institute of Science and Technology Austria, 2020.

Royer A. 2020. Leveraging structure in Computer Vision tasks for flexible Deep Learning models. Institute of Science and Technology Austria.

Royer, Amélie. Leveraging Structure in Computer Vision Tasks for Flexible Deep Learning Models. Institute of Science and Technology Austria, 2020, doi:10.15479/AT:ISTA:8390.

All files available under the following license(s):

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0):