Diving into the mathematical framework of deep learning

There’s been an astronomical amount of hype, discussion, and debate about the prospects and dangers of machine learning and artificial intelligence. Geoffrey Hinton‘s rather dystopic note of caution, despite being prudent, inadvertently drew my attention further into this field.

My first exposure to neural networks was towards the end of my freshman year in my undergrad, when I discovered the excellent video series by 3Blue1Brown on the subject. At that time, I didn’t delve too deeply, as the entire approach of neural networks seemed rather piecemeal, pulling strategies from various disparate fields, and also because I didn’t have much in common with the software crowd (TensorFlow et al.). However, it was hard to look away from the undeniable success of the GPT-4 project, in early 2023.

More recently, I stumbled upon a paper that presents a Riemannian geometry framework for neural networks. It’s not a widely popular framework, but it offers a compelling mathematical structure:

Principles of Riemannian Geometry in Neural Networks

The paper describes neural networks as coordinate transformations with certain differentiability characteristics. It views the network as a succession of coordinate transformations mapping an input x^{(l)} to an output x^{(l+1)}.

In a typical feedforward neural network, the transformation is defined by a function f, which is applied to the input x^{(l)} at each layer. This construction represents a \mathcal{C}^0 (continuous) coordinate transformation, with no explicit provision for smoothness or differentiability between layers.

In contrast, a residual network adds a skip connection, allowing the direct addition of the input x^{(l)} to the transformed output f(x^{(l)}). This type of network can be understood as a first-order forward difference approximation to a \mathcal{C}^1 (continuously differentiable) coordinate transformation in the limit of infinite layers.

I have yet to read the paper in detail, and I am uncertain how far this mathematical framework can be pushed, in light of the recent advancements in the field. Nonetheless, it seems like a promising start and I hope to improve my understanding over the summer weekends.

Types of von Neumann algebras

Theorem: The factors Z(\mathcal A) = \mathbb C, where \mathcal A is a *-algebra, fall into 3 categories:

  1. Type I: These are the usual matrix algebras M_N(\mathbb C) (type I_N) and the algebra B(H) with H separable.
  2. Type II: These are the \infty-dimensional factors having a trace \mathrm{tr}: A \to \mathbb C (type {II}_1) and their tensor products with \mathcal B(H) (type II_{\infty}).
  3. Type III: These fall in several classes III_{\lambda} with \lambda \in [0, 1] and appear from II_1 factors via cross-product type constructions.

Reference: Introduction to Operator Algebras by Teo Banica.

Matrix properties that carry over to C*-algebra elements

There are some common properties of square matrices that carry over to elements of a C*-algebra.

  1. The norm of a matrix larger than its eigenvalues. This, in fact, follows from the definition of matrix norm.
  2. The eigenvalues of a unitary matrix have modulus 1.
  3. The eigenvalues of a self-adjoint matrix are real.
  4. The spectral radius of a normal matrix is equal to its norm. (See here.)

Let \mathcal A be a C*-algebra. Given an element a \in A, its spectral radius \rho(a) is the radius of smallest disc centered at 0, containing \sigma(a).

Theorems (Source)

  1. The spectrum of a norm 1 element lies in the unit disc.
  2. The spectrum of a unitary element (a^* = a^{-1}) is on the unit circle.
  3. The spectrum of a self-adjoint element (a = a^*) is real.
  4. \rho of a normal element (aa^* = a^*a) equals its norm.

Rational functional calculus

I just learned about the term functional calculus as being a part of functional analysis and spectral theory. Given a \in A (an algebra), and a rational function f = P/Q, one can construct f(a) = P(a)Q(a)^{-1}. The rational functional calculus formula \sigma(f(a)) = f(\sigma(a)) is valid for any f \in \mathbb C(X) having poles outside \sigma(a). I will get to the proof soon.

I’m learning this material from Teo Banica’s excellent mini-lecture series on Quantum Groups.

Commutativity of spectrum on algebras with unit

Let \mathcal A be a \mathbb C-algebra with a unit, where \mathcal A is the algebra of bounded linear operators on a Banach space. Then \sigma(ab) \setminus \{0\} = \sigma(ba) \setminus \{0\}.

Definition: The spectrum of an element a \in A is the set \sigma(a) = \{\lambda \in \mathbb C | a - \lambda \not \in A^{-1}\}

where a^{-1} \subset a is the set of invertible elements.

For matrices, we obtain the eigenvalue set. For continuous functions, we obtain the image.

If \lambda \not \in \sigma(ab) \cup \{0\} then there is a c s.t. c(\lambda - ab) = 1 = (\lambda - ab)c.

Then one can verify that \lambda^{-1}(1 + bca) is the inverse of (\lambda - ba) so that \lambda \not \in \sigma(ba) \cup \{0\}:

(1 + bca)(\lambda - ba) = \lambda = (\lambda - ba)(1 + bca).

This inverse can be guessed by an analogy with the geometric series (1 - x)^{-1}  = 1 + x + x^2 + \cdots as explained here.

Three basic results about C* algebras

  1. [Gelfand-Naimark Theorem] An arbitrary C*-algebra A is isometrically *-isomorphic to a C*-subalgebra of bounded operators on a Hilbert space. Any C*-algebra is a closed subalgebra of B(H).
  2. [Gelfand Duality] Commutative C*-algebras are those identified by Gelfand duality with algebras of continuous functions on compact Hausdorff topological spaces, i.e., C(X) for compact Hausdorff space X.
  3. [Wedderburn Structure Theorem] Any finite-dimensional C*-algebra is a sum of matrix algebras \oplus_i M_{N_i}(\mathbb C).

Clifford group in mathematics vs. Clifford gates in quantum computing

Given a finite-dimensional vector space, V, and a quadratic form, \Phi, on V, the Clifford group of \Phi is the group

\Gamma(\Phi) = \{x \in \mathrm{Cl}(\Phi)^* | \alpha(x) v x^{-1} \text{ for all } v \in V\}

where \alpha is the unique canonical isomorphism of the Clifford algebra.

I do not know how much of the Clifford algebra literature Daniel Gottesman was familiar with when he defined the Clifford gates as the group of unitaries that normalize the Pauli group, i.e,

but I think the definitions match up if we consider the vector space V to be the vector space spanned by the multi-Paulis and the unitary matrices as the Clifford algebra on this vector space — this is plausible because the multi-Paulis form a complete basis for the unitary matrices.

I do not understand the relation between these concepts well enough yet, but this is a start.