_id,doi,title
14446,10.2478/msr-2023-0023,Against the flow of time with multi-output models
9416,,The inductive bias of ReLU networks on orthogonally separable data
9418,10.15479/AT:ISTA:9418,Underspecification in deep learning
7481,,Functional vs. parametric equivalence of ReLU networks
7479,10.1109/ICCV.2019.00144,Distillation-based training for multi-exit architectures
6569,,Towards understanding knowledge distillation
