Diving into the mathematical framework of deep learning

There’s been an astronomical amount of hype, discussion, and debate about the prospects and dangers of machine learning and artificial intelligence. Geoffrey Hinton‘s rather dystopic note of caution, despite being prudent, inadvertently drew my attention further into this field.

My first exposure to neural networks was towards the end of my freshman year in my undergrad, when I discovered the excellent video series by 3Blue1Brown on the subject. At that time, I didn’t delve too deeply, as the entire approach of neural networks seemed rather piecemeal, pulling strategies from various disparate fields, and also because I didn’t have much in common with the software crowd (TensorFlow et al.). However, it was hard to look away from the undeniable success of the GPT-4 project, in early 2023.

More recently, I stumbled upon a paper that presents a Riemannian geometry framework for neural networks. It’s not a widely popular framework, but it offers a compelling mathematical structure:

Principles of Riemannian Geometry in Neural Networks

The paper describes neural networks as coordinate transformations with certain differentiability characteristics. It views the network as a succession of coordinate transformations mapping an input x^{(l)} to an output x^{(l+1)}.

In a typical feedforward neural network, the transformation is defined by a function f, which is applied to the input x^{(l)} at each layer. This construction represents a \mathcal{C}^0 (continuous) coordinate transformation, with no explicit provision for smoothness or differentiability between layers.

In contrast, a residual network adds a skip connection, allowing the direct addition of the input x^{(l)} to the transformed output f(x^{(l)}). This type of network can be understood as a first-order forward difference approximation to a \mathcal{C}^1 (continuously differentiable) coordinate transformation in the limit of infinite layers.

I have yet to read the paper in detail, and I am uncertain how far this mathematical framework can be pushed, in light of the recent advancements in the field. Nonetheless, it seems like a promising start and I hope to improve my understanding over the summer weekends.

Leave a comment