Underspecification in deep learning

Bui Thi Mai, Phuong

Underspecification in deep learning

Phuong M. 2021. Underspecification in deep learning. Institute of Science and Technology Austria.

Download

mph-thesis-v519-pdfimages.pdf 2.67 MB [Published Version]

DOI

10.15479/AT:ISTA:9418

Thesis | PhD | Published | English

Author

Phuong, Mary^ISTA

Supervisor

Lampert, Christoph^ISTA

Department

Graduate School
Lampert Group

Series Title

ISTA Thesis

Abstract

Deep learning is best known for its empirical success across a wide range of applications spanning computer vision, natural language processing and speech. Of equal significance, though perhaps less known, are its ramifications for learning theory: deep networks have been observed to perform surprisingly well in the high-capacity regime, aka the overfitting or underspecified regime. Classically, this regime on the far right of the bias-variance curve is associated with poor generalisation; however, recent experiments with deep networks challenge this view. This thesis is devoted to investigating various aspects of underspecification in deep learning. First, we argue that deep learning models are underspecified on two levels: a) any given training dataset can be fit by many different functions, and b) any given function can be expressed by many different parameter configurations. We refer to the second kind of underspecification as parameterisation redundancy and we precisely characterise its extent. Second, we characterise the implicit criteria (the inductive bias) that guide learning in the underspecified regime. Specifically, we consider a nonlinear but tractable classification setting, and show that given the choice, neural networks learn classifiers with a large margin. Third, we consider learning scenarios where the inductive bias is not by itself sufficient to deal with underspecification. We then study different ways of ‘tightening the specification’: i) In the setting of representation learning with variational autoencoders, we propose a hand- crafted regulariser based on mutual information. ii) In the setting of binary classification, we consider soft-label (real-valued) supervision. We derive a generalisation bound for linear networks supervised in this way and verify that soft labels facilitate fast learning. Finally, we explore an application of soft-label supervision to the training of multi-exit models.

Publishing Year

2021

Date Published

2021-05-30

Publisher

Institute of Science and Technology Austria

Acknowledged SSUs

Scientific Computing
Campus IT
Library

Page

125

ISSN

2663-337X