High-dimensional limits in artificial neural networks (Record no. 768054)

MARC details
000 -LEADER
fixed length control field 04867ntm a22003857a 4500
003 - CONTROL NUMBER IDENTIFIER
control field AT-ISTA
005 - DATE AND TIME OF LATEST TRANSACTION
control field 20250915092030.0
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field 250915s2024 au ||||| m||| 00| 0 eng d
040 ## - CATALOGING SOURCE
Transcribing agency ISTA
100 ## - MAIN ENTRY--PERSONAL NAME
Personal name Shevchenko, Aleksandr
9 (RLIN) 1084218
245 ## - TITLE STATEMENT
Title High-dimensional limits in artificial neural networks
260 ## - PUBLICATION, DISTRIBUTION, ETC. (IMPRINT)
Name of publisher, distributor, etc. Institute of Science and Technology Austria
Date of publication, distribution, etc. 2024
500 ## - GENERAL NOTE
General note Thesis
505 ## - FORMATTED CONTENTS NOTE
Formatted contents note Abstract
505 ## - FORMATTED CONTENTS NOTE
Formatted contents note Acknowledgements
505 ## - FORMATTED CONTENTS NOTE
Formatted contents note About the Author
505 ## - FORMATTED CONTENTS NOTE
Formatted contents note List of Collaborators and Publications
505 ## - FORMATTED CONTENTS NOTE
Formatted contents note Table of Contents
505 ## - FORMATTED CONTENTS NOTE
Formatted contents note List of Figures
505 ## - FORMATTED CONTENTS NOTE
Formatted contents note 1 Introduction
505 ## - FORMATTED CONTENTS NOTE
Formatted contents note 2 Background
505 ## - FORMATTED CONTENTS NOTE
Formatted contents note 3 Landscape Connectivity and Dropout Stability of SGD Solutions
505 ## - FORMATTED CONTENTS NOTE
Formatted contents note 4 Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks
505 ## - FORMATTED CONTENTS NOTE
Formatted contents note 5 Fundamental Limits of Two-layer Autoencoders
505 ## - FORMATTED CONTENTS NOTE
Formatted contents note 6 Autoencoders: Beyond Gaussian Data
505 ## - FORMATTED CONTENTS NOTE
Formatted contents note 7 Discussion and Concluding Remarks
505 ## - FORMATTED CONTENTS NOTE
Formatted contents note Bibliography
505 ## - FORMATTED CONTENTS NOTE
Formatted contents note A Appendix for Chapter 3
505 ## - FORMATTED CONTENTS NOTE
Formatted contents note B Appendix for Chapter 4
505 ## - FORMATTED CONTENTS NOTE
Formatted contents note C Appendix for Chapter 5
505 ## - FORMATTED CONTENTS NOTE
Formatted contents note D Appendix for Chapter 6
520 ## - SUMMARY, ETC.
Summary, etc. In the modern age of machine learning, artificial neural networks have become an integral part of many practical systems. One of the key ingredients of the success of the deep learning approach is recent computational advances which allowed the training of models with billions of parameters on large-scale data. Such over-parameterized and data-hungry regimes pose a challenge for the theoretical analysis of modern models since “classical” statistical wisdom is no longer applicable. In this view, it is paramount to extend or develop new machinery that will allow tackling the neural network analysis under new challenging asymptotic regimes, which is the focus of this thesis. Large neural network systems are usually optimized via “local” search algorithms, such as stochastic gradient descent (SGD). However, given the high-dimensional nature of the parameter space, it is a priori not clear why such a crude “local” approach works so remarkably well in practice. We take a step towards demystifying this phenomenon by showing that the landscape of the SGD training dynamics exhibits a few beneficial properties for the optimization. First, we show that along the SGD trajectory an over-parameterized network is dropout stable. The emergence of dropout stability allows to conclude that the minima found by SGD are connected via a continuous path of small loss. This in turn means that the high-dimensional landscape of the neural network optimization problem is provably not so unfavourable to gradient-based training, due to mode connectivity. Next, we show that SGD for an over-parameterized network tends to find solutions that are functionally more “simple”. This in turn means that the SGD minima are more robust, since a less complicated solution will less likely overfit the data. More formally, for a prototypical example of a wide two-layer ReLU network on a 1d regression task we show that the SGD algorithm is implicitly selective in its choice of an interpolating solution. Namely, at convergence the neural network implements a piece-wise linear function with the number of linear regions depending only on the amount of training data. This is in contrast to a “smooth”-like behaviour which one would expect given such a severe over-parameterization of the model. Diverging from the generic supervised setting of classification and regression problems, we analyze an auto-encoder model that is commonly used for representation learning and data compression. Despite the wide applicability of the auto-encoding paradigm, the theoretical understanding of their behaviour is limited even in the simplistic shallow case. The related work is restricted to extreme asymptotic regimes in which the auto-encoder is either severely over-parameterized or under-parameterized. In contrast, we provide a tight characterization for the 1-bit compression of Gaussian signals in the challenging proportional regime, i.e., the input dimension and the size of the compressed representation obey the same asymptotics. We also show that gradient-based methods are able to find a globally optimal solution and that the predictions made for Gaussian data extrapolate beyond - to the case of compression of natural images. Next, we relax the Gaussian assumption and study more structured input sources. We show that the shallow model is sometimes agnostic to the structure of the data vii which results in a Gaussian-like behaviour. We prove that making the decoding component slightly less shallow is already enough to escape the “curse” of Gaussian performance.
856 ## - ELECTRONIC LOCATION AND ACCESS
Uniform Resource Identifier <a href="https://doi.org/10.15479/at:ista:17465">https://doi.org/10.15479/at:ista:17465</a>
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Source of classification or shelving scheme Dewey Decimal Classification
Holdings
Withdrawn status Lost status Source of classification or shelving scheme Damaged status Not for loan Home library Current library Date acquired Total Checkouts Full call number Barcode Date last seen Price effective from Koha item type
  Not Lost Dewey Decimal Classification     Library Library 15/09/2025   Quiet Room AT-ISTA#003305 16/09/2025 15/09/2025 Book

Powered by Koha