Diving into the mathematical framework of deep learning

Leave a reply

There’s been an astronomical amount of hype, discussion, and debate about the prospects and dangers of machine learning and artificial intelligence. Geoffrey Hinton‘s rather dystopic note of caution, despite being prudent, inadvertently drew my attention further into this field.

My first exposure to neural networks was towards the end of my freshman year in my undergrad, when I discovered the excellent video series by 3Blue1Brown on the subject. At that time, I didn’t delve too deeply, as the entire approach of neural networks seemed rather piecemeal, pulling strategies from various disparate fields, and also because I didn’t have much in common with the software crowd (TensorFlow et al.). However, it was hard to look away from the undeniable success of the GPT-4 project, in early 2023.

More recently, I stumbled upon a paper that presents a Riemannian geometry framework for neural networks. It’s not a widely popular framework, but it offers a compelling mathematical structure:

Principles of Riemannian Geometry in Neural Networks

The paper describes neural networks as coordinate transformations with certain differentiability characteristics. It views the network as a succession of coordinate transformations mapping an input $x^{(l)}$ to an output $x^{(l+1)}$ .

In a typical feedforward neural network, the transformation is defined by a function $f$ , which is applied to the input $x^{(l)}$ at each layer. This construction represents a $\mathcal{C}^0$ (continuous) coordinate transformation, with no explicit provision for smoothness or differentiability between layers.

In contrast, a residual network adds a skip connection, allowing the direct addition of the input $x^{(l)}$ to the transformed output $f(x^{(l)})$ . This type of network can be understood as a first-order forward difference approximation to a $\mathcal{C}^1$ (continuously differentiable) coordinate transformation in the limit of infinite layers.

I have yet to read the paper in detail, and I am uncertain how far this mathematical framework can be pushed, in light of the recent advancements in the field. Nonetheless, it seems like a promising start and I hope to improve my understanding over the summer weekends.

Types of von Neumann algebras

Leave a reply

Theorem: The factors $Z(\mathcal A) = \mathbb C$ , where $\mathcal A$ is a *-algebra, fall into 3 categories:

Type I: These are the usual matrix algebras $M_N(\mathbb C)$ (type $I_N$ ) and the algebra $B(H)$ with $H$ separable.
Type II: These are the $\infty$ -dimensional factors having a trace $\mathrm{tr}: A \to \mathbb C$ (type ${II}_1$ ) and their tensor products with $\mathcal B(H)$ (type $II_{\infty}$ ).
Type III: These fall in several classes $III_{\lambda}$ with $\lambda \in [0, 1]$ and appear from $II_1$ factors via cross-product type constructions.

Reference: Introduction to Operator Algebras by Teo Banica.

Free real sphere and free complex sphere

Leave a reply

The free real sphere is defined as

$C(S_{\mathbb R, +}^{N - 1}) = C^*(x_1, \ldots, x_N | x_i = x_i^*, \sum x_i^2 = 1)$

and the free complex sphere is defined as

$C(S_{\mathbb C. +}^{N - 1}) C^*(x_1, \ldots, x_N | \sum x_i x_i^* = \sum_i x_i*x_i = 1)$ .

Here C* means “universal C*-algebra generated by”.

Noncommutative compact space

Leave a reply

For each $C^*$ -algebra $A$ , we can write

$A = C(X)$

and $X$ is called a noncommutative compact space.

This relies on the Gelfand theorem which says that:

For each C*-algebra $A$ there is a unique compact Hausdorff space $K$ such that $A$ is isometrically *-isometric to $C(K)$ .

Hopf C*-algebra

Leave a reply

A Hopf C*-algebra is a C*-algebra $\mathcal A$ equipped with a structure analogous to that of a Hopf algebra on the underlying associative algebra.

A Hopf C*-algebra is a C*-algebra $\mathcal A$ with morphisms: $\Delta: A \to A \otimes A$ , $\varepsilon: A \to \mathbb C$ and $S: A \to A^{\mathrm{opp}}$ satisfying suitable axioms as in the group case. We write $A = C(G) = C^*(\Gamma)$ and call $G$ a compact quantum group and $\Gamma$ a discrete quantum group.

Matrix properties that carry over to C*-algebra elements

Leave a reply

There are some common properties of square matrices that carry over to elements of a C*-algebra.

The norm of a matrix larger than its eigenvalues. This, in fact, follows from the definition of matrix norm.
The eigenvalues of a unitary matrix have modulus 1.
The eigenvalues of a self-adjoint matrix are real.
The spectral radius of a normal matrix is equal to its norm. (See here.)

Let $\mathcal A$ be a C*-algebra. Given an element $a \in A$ , its spectral radius $\rho(a)$ is the radius of smallest disc centered at $0$ , containing $\sigma(a)$ .

Theorems (Source)

The spectrum of a norm 1 element lies in the unit disc.
The spectrum of a unitary element ( $a^* = a^{-1}$ ) is on the unit circle.
The spectrum of a self-adjoint element ( $a = a^*$ ) is real.
$\rho$ of a normal element ( $aa^* = a^*a$ ) equals its norm.

Rational functional calculus

Leave a reply

I just learned about the term functional calculus as being a part of functional analysis and spectral theory. Given $a \in A$ (an algebra), and a rational function $f = P/Q$ , one can construct $f(a) = P(a)Q(a)^{-1}$ . The rational functional calculus formula $\sigma(f(a)) = f(\sigma(a))$ is valid for any $f \in \mathbb C(X)$ having poles outside $\sigma(a)$ . I will get to the proof soon.

I’m learning this material from Teo Banica’s excellent mini-lecture series on Quantum Groups.

Commutativity of spectrum on algebras with unit

Leave a reply

Let $\mathcal A$ be a $\mathbb C$ -algebra with a unit, where $\mathcal A$ is the algebra of bounded linear operators on a Banach space. Then $\sigma(ab) \setminus \{0\} = \sigma(ba) \setminus \{0\}$ .

Definition: The spectrum of an element $a \in A$ is the set $\sigma(a) = \{\lambda \in \mathbb C | a - \lambda \not \in A^{-1}\}$

where $a^{-1} \subset a$ is the set of invertible elements.

For matrices, we obtain the eigenvalue set. For continuous functions, we obtain the image.

If $\lambda \not \in \sigma(ab) \cup \{0\}$ then there is a $c$ s.t. $c(\lambda - ab) = 1 = (\lambda - ab)c$ .

Then one can verify that $\lambda^{-1}(1 + bca)$ is the inverse of $(\lambda - ba)$ so that $\lambda \not \in \sigma(ba) \cup \{0\}$ :

$(1 + bca)(\lambda - ba) = \lambda = (\lambda - ba)(1 + bca)$ .

This inverse can be guessed by an analogy with the geometric series $(1 - x)^{-1} = 1 + x + x^2 + \cdots$ as explained here.

Three basic results about C* algebras

Leave a reply

[Gelfand-Naimark Theorem] An arbitrary C*-algebra A is isometrically *-isomorphic to a C*-subalgebra of bounded operators on a Hilbert space. Any C*-algebra is a closed subalgebra of $B(H)$ .
[Gelfand Duality] Commutative C*-algebras are those identified by Gelfand duality with algebras of continuous functions on compact Hausdorff topological spaces, i.e., $C(X)$ for compact Hausdorff space $X$ .
[Wedderburn Structure Theorem] Any finite-dimensional $C*$ -algebra is a sum of matrix algebras $\oplus_i M_{N_i}(\mathbb C)$ .

Clifford group in mathematics vs. Clifford gates in quantum computing

Leave a reply

Given a finite-dimensional vector space, $V$ , and a quadratic form, $\Phi$ , on $V$ , the Clifford group of $\Phi$ is the group

$\Gamma(\Phi) = \{x \in \mathrm{Cl}(\Phi)^* | \alpha(x) v x^{-1} \text{ for all } v \in V\}$

where $\alpha$ is the unique canonical isomorphism of the Clifford algebra.

I do not know how much of the Clifford algebra literature Daniel Gottesman was familiar with when he defined the Clifford gates as the group of unitaries that normalize the Pauli group, i.e,

but I think the definitions match up if we consider the vector space $V$ to be the vector space spanned by the multi-Paulis and the unitary matrices as the Clifford algebra on this vector space — this is plausible because the multi-Paulis form a complete basis for the unitary matrices.

I do not understand the relation between these concepts well enough yet, but this is a start.

Shadows and Acorns | Sanchayan's

Mathematics and Theoretical CS — short-form updates on my research and academic interests.

Diving into the mathematical framework of deep learning

Types of von Neumann algebras

Free real sphere and free complex sphere

Noncommutative compact space

Hopf C*-algebra

Matrix properties that carry over to C*-algebra elements

Rational functional calculus

Commutativity of spectrum on algebras with unit

Three basic results about C* algebras

Clifford group in mathematics vs. Clifford gates in quantum computing