## May 17, 2022

### The Magnitude of Information

#### Posted by Tom Leinster Guest post by Heiko Gimperlein, Magnus Goffeng and Nikoletta Louca

The magnitude of a metric space $(X,d)$ does not require further introduction on this blog. Two of the hosts, Tom Leinster and Simon Willerton, conjectured that the magnitude function $\mathcal{M}_X(R) := \mathrm{Mag}(X,R \cdot \mathrm{d})$ of a convex body $X \subset \mathbb{R}^n$ with Euclidean distance $\mathrm{d}$ captures classical geometric information about $X$:

\begin{aligned} \mathcal{M}_X(R) =& \frac{1}{n! \omega_n} \mathrm{vol}_n(X)\ R^n + \frac{1}{2(n-1)! \omega_{n-1}} \mathrm{vol}_{n-1}(\partial X)\ R^{n-1} + \cdots + 1 \\ =& \frac{1}{n! \omega_n} \sum_{j=0}^n c_j(X)\ R^{n-j} \end{aligned}

where $c_j(X) = \gamma_{j,n} V_j(X)$ is proportional to the $j$-th intrinsic volume $V_j$ of $X$ and $\omega_n$ is the volume of the unit ball in $\mathbb{R}^n$.

Even more basic geometric questions have remained unknown, including:

• What geometric content is encoded in $\mathcal{M}_X$?
• What can be said about the magnitude function of the unit disk $B_2 \subset \mathbb{R}^2$?

We discuss in this post how these questions led us to possible relations to information geometry. We would love to hear from you:

• Is magnitude an interesting invariant for information geometry?
• Is there a category theoretic motivation, like Lawvere’s view of a metric space as an enriched category?
• Does the magnitude relate to notions studied in information geometry? Recent years have seen much progress to understand the geometric content of the magnitude function for domains in odd-dimensional Euclidean space. In this setting Meckes and Barceló–Carbery showed how to compute magnitude using differential equations. Nevertheless, as Carbery often emphasized, hardly anything was known even for such simple geometries as the unit disk $B_2$ in $\mathbb{R}^2$.

Our new works Semiclassical analysis of a nonlocal boundary value problem related to magnitude and The magnitude and spectral geometry show, in particular, that as $R\to \infty$,

$\mathcal{M}_{B_2}(R) = \frac{1}{2}R^2 + \frac{3}{2}R + \frac{9}{8}+O(R^{-1}),$

and that $\mathcal{M}_{B_2}(R)$ is not a polynomial.
The approach does not use differential equations, but methods for integral equations. Recall that the magnitude of a positive definite compact metric space $(X,\mathrm{d})$ is defined as

$\mathcal{M}_{(X,D)}(R):=\int_X u_R(x) \ \mathrm{d}x,$

where $u_R$ is the unique distribution supported in $X$ that solves the integral equation

$\int_X \mathrm{e}^{-R\mathrm{d}(x,y)}u_R(y) \ \mathrm{d}y=1.$

We analyze this integral equation in geometric settings, for domains $X$ in $\mathbb{R}^n$, in spheres, tori or, generally, a manifold with boundary with a distance function of any dimension, as illustrated in the figures above. Our results shed light on the geometric content of their magnitude. In fact, our results apply beyond classical geometry and metric spaces — not even a distance function is needed!

Our techniques suggest a life of magnitude beyond metric spaces in information geometry. There one considers a statistical manifold, i.e. a smooth manifold $X$ with a divergence $D$, not with a distance function: See John Baez’s posts on information geometry. A first example of a divergence is the square of the subspace distance $|x-y|^2$ on a submanifold in Euclidean space. A second example is the square of the geodesic distance function on a Riemannian manifold $(X,g)$, provided that it is smooth. (Note that on a circle the distance function and its square are non-smooth when $x$ and $y$ are conjugate points.) In general, a divergence $D=D(x,y)$ is a smooth, non-negative function such that $D$ is a Riemannian metric near $x=y$ modulo lower order terms, in the sense that

$D(x,x+v)=g_{D,x}(v,v)+O(|v|^3)$

for a Riemannian metric $g_D$ on $X$.

Divergences related to relative entropy have long been used in statistics to study families of probability measures. The relative entropy of two probability measures $\mu$ and $\nu$ on a space $\Omega$ is defined as

$D(\mu,\nu):=\int_\Omega \log\left(\frac{\mathrm{d}\nu}{\mathrm{d}\mu}\right)\mathrm{d}\nu\in [0,\infty].$

The notion of relative entropy and its cousins are discussed in the blog posts of Baez mentioned above and also in Leinster’s book Entropy and Diversity: The Axiomatic Approach. While the space of probability measures is too big, one can restrict to interesting submanifolds (with boundary).

Here is the definition of the magnitude function of a statistical manifold with boundary $(X,D)$, when $R \gg 0$ is sufficiently large:

$\mathcal{M}_{(X,D)}(R):=\int_X u_R(x) \ \mathrm{d}x,$

where $u_R$ is the unique distribution supported in $X$ that solves the integral equation

$\int_X \mathrm{e}^{-R\sqrt{D(x,y)}}u_R(y) \ \mathrm{d}y=1.$

When $D$ is the square of a distance function on $X$, we recover the magnitude of the metric space $(X,\sqrt{D})$.

We emphasize two key points to take home:

1. The integral equation approach is equivalent to defining the magnitude of statistical manifolds using Meckes’s classical approach in Magnitude, diversity, capacities, and dimensions of metric spaces, relying on reproducing kernel Hilbert spaces.

2. Since $D$ is smooth, $\mathcal{M}_{(X,D)}$ shares the properties stated in The magnitude and spectral geometry for the magnitude function summarized next.

Theorem

a. The magnitude is well-defined for $R\gg 0$ sufficiently large; there the integral equation admits a unique distributional solution.

b. $\mathcal{M}_{(X,D)}$ extends meromorphically to the complex plane.

c. The asymptotic behavior of $\mathcal{M}_{(X,D)}(R) = \frac{1}{n! \omega_n} \sum_{j=0}^\infty c_j(X) R^{n-j}+O(R^{-\infty})$ is determined by the Taylor coefficients of $v\mapsto D(x,x+v)$ and $n=\mathrm{dim} X$.

Further details and explicit computations of the first few terms $c_j$(X) can be found in The magnitude and spectral geometry: For a Riemannian manifold $c_0$ is proportional to the volume of $X$, while $c_1$ is proportional to the surface area of $\partial X$. $c_2$ involves the integral of the scalar curvature of $X$ and the integral of the mean curvature of $\partial X$. All these computations are relative to $D$ and the Riemannian metric that $D$ defines. For Euclidean domains $X \subset \mathbb{R}^n$, $c_3$ is proportional to the Willmore energy of $\partial X$ (proven with older technology in another paper: The Willmore energy and the magnitude of Euclidean domains). We note that

in all known computations of asymptotics for Euclidean domains $X \subset \mathbb{R}^n$, $c_j(X)$ is proportional to $\int_{\partial X}H^{j-1}\mathrm{d}S$ for $j\gt 0$.

Here $H$ denotes the mean curvature of $\partial X$. You can compute lower-order terms by an iterative scheme, for as long as you have the time. In fact, we have written a python code which computes $c_j$ for any $j$, which is available at arXiv:2201.11363.

We would love to hear from you should you have any thoughts on the following questions:

• Is magnitude an interesting invariant for information geometry?

• Is there a category theoretic motivation, like Lawvere’s view of a metric space as an enriched category?

• Does the magnitude relate to notions studied in information geometry?

Posted at May 17, 2022 11:44 AM UTC

TrackBack URL for this Entry:   https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/3397

### Re: The Magnitude of Information

There’s loads of good stuff here, but let me highlight one particular point: finally, we know something about the magnitude of a Euclidean disc!

Some context:

• We’ve known from the start that the magnitude of a 1-dimensional ball — that is, a line segment — is a polynomial of degree 1 in its radius (or length).

• Juan Antonio Barceló and Tony Carbery showed that the magnitude of a 3-dimensional ball is a polynomial of degree 3 in its radius.

• They also showed that in odd dimensions $d \geq 5$, the magnitude of a $d$-dimensional ball is a rational function in the radius, but not a polynomial.

• And we’ve known for a long time that for every $d$, the magnitude of the $d$-dimensional ball of radius $R$ grows like $R^d$ as $R \to \infty$. This explains the degrees of the polynomials just mentioned.

What we didn’t know much about until Heiko, Magnus and Nikoletta’s work is the magnitude of even-dimensional balls. As they say in the post above, the 2-dimensional ball of radius $R$ has magnitude

$\frac{1}{2}R^2 + \frac{3}{2} R + \frac{9}{8} + O(R^{-1})$

as $R \to \infty$, and it’s not a polynomial in $R$!

So we have a funny situation:

• the magnitude of a 1-dimensional ball is a polynomial in its radius;

• the magnitude of a 3-dimensional ball is a polynomial in its radius;

• but the magnitude of a 2-dimensional ball is not.

I have a very rough idea of why that’s the case, but really not what you could call a conceptual explanation. Finding a conceptual explanation seems like a real challenge.

Posted by: Tom Leinster on May 18, 2022 12:11 AM | Permalink | Reply to this

### Re: The Magnitude of Information

You’ve shown that the magnitude function of a 2-disc is not a polynomial. Do you know whether it’s a rational function?

Posted by: Tom Leinster on May 18, 2022 12:16 AM | Permalink | Reply to this

### Re: The Magnitude of Information

Unfortunately, our methods do not allow us to say whether or not the magnitude function of the unit disk is rational.

Except for odd-dimensional balls little seems known about algebraic properties of the magnitude function. The simple formulas for the first expansion coefficients $c_j(X)$ of a general domain $X$ in $\mathbb{R}^n$ (see the bold-faced “note” after the theorem) might hint at interesting, but hidden structure.

Posted by: Heiko Gimperlein on May 18, 2022 6:12 AM | Permalink | Reply to this

### Re: The Magnitude of Information

Thanks.

I wonder whether Simon has thoughts on what rational function it might be if it is a rational function, given the intensive study he’s done of the coefficients of these functions for odd-dimensional balls.

Posted by: Tom Leinster on May 18, 2022 9:34 AM | Permalink | Reply to this

### Re: The Magnitude of Information

Looking at the information geometry aspect of this post, I see that there’s a very simple question to which I don’t know the answer. I’m guessing no one does, but please correct me if I’m wrong.

Take the unit interval $[0, 1]$. We know its magnitude with respect to the usual distance. But what is its magnitude if we replace the usual distance by the square root of relative entropy, regarding $[0, 1]$ as the set of probability distributions on a two-element set? In other words, for $p, q \in [0, 1]$, we’re replacing $|p - q|$ by

$\rho(p, q) := \left\{ p \log \biggl( \frac{p}{q} \biggr) + (1 - p) \log \biggl( \frac{1 - p}{1 - q} \biggr) \right\}^{1/2}.$

Or, as usual, we can ask the same question with a scale factor introduced, i.e. replacing $\rho(p, q)$ by $R \rho(p, q)$ for a variable factor $R \gt 0$.

I think this is the simplest case of the question that Heiko, Magnus and Nikoletta are asking towards the end of the post.

The theorem there gives us some information, but only some. E.g. it tells us that the large-$R$ asymptotics of the magnitude function are the same as those of an ordinary interval, right? But perhaps we can say something about non-large $R$. Perhaps there’s even a closed form formula.

Posted by: Tom Leinster on May 22, 2022 10:58 AM | Permalink | Reply to this

### Re: The Magnitude of Information

Thanks for this interesting question Tom!

At the current state of affairs, there might be more hope for explicit computations in this example than any statements from our methods. The example you give is not smooth in the end points q=0 and q=1, so our methods can not say anything about the boundary terms in that case. The methods can describe the interior behavior (up to arbitrary order of R), and provide complete asymptotics on any proper subdomain $[\epsilon,1-\delta]$. Perhaps I am overly negative, but the endpoints would require heavier hammers than we are wielding.

From a bigger perspective, it might be that your example showcases a potential issue with applications of magnitude to information geometry: that the extreme points do not behave that nice in a divergence sense. Or am I extrapolating too much from your example?

Posted by: Magnus Goffeng on May 22, 2022 7:25 PM | Permalink | Reply to this

### Re: The Magnitude of Information

No, I agree, the boundary is potentially a real issue.

To explain why, let me back up a bit. I suspect I’m about to say some stuff you already know, Magnus, but I’ll say it anyway.

For any Riemannian manifold $M = (M, g)$, and any point $p$ of $M$, we have a function

$d(-, p)^2: M \to \mathbb{R},$

where $d$ is the geodesic distance derived from the Riemannian metric $g$. This function is smooth in a neighbourhood of $p$, so we can take its Hessian

$Hess_p(d(-, p)^2),$

which is a bilinear form on the tangent space $T_p M$. But the original metric $g$ itself also gives a bilinear form on $T_p M$ — namely, $g_p$. These two forms are essentially equal:

$g_p = \frac{1}{2} Hess_p(d(-, p)^2).$

The point of this equation is that it expresses the infinitesimal metric $g$ in terms of the global metric $d$.

(What I’ve just said seems to be obvious to Riemannian geometers, but I’ve never succeeded in finding a reference where it’s expressed this directly. It’s done on p.78 of my book.)

Now here’s the main thought. If we have some roughly distance-like thing $\delta$ that nevertheless doesn’t satisfy the metric space axioms, we can still use the formula

$g_p = \frac{1}{2} Hess_p(\delta(-, p)^2)$

to define a Riemannian metric $g$. So from a “fake” distance function $\delta$, we derive a genuine Riemannian metric $g$. And from $g$, we can construct its geodesic distance function $d$, which does satisfy the metric space axioms.

(Aside: I’d love to see a purely elementary version of the construction $\delta \mapsto d$ that doesn’t refer to the manifold structure or any kind of smoothness.)

The point of all this is that you can take $\delta$ to be the square root of relative entropy on the manifold $\Delta_n^\circ$ of probability distributions on $\{1, \ldots, n\}$ of full support. And when we do this, the process just described spits out a genuine metric on $\Delta_n^\circ$, called the Fisher metric. Up to a scale factor, it makes $\Delta_n^\circ$ isometric to the positive orthant

$S^{n - 1}_+ := S^{n - 1} \cap (0, \infty)^n$

of the unit sphere with its geodesic metric. The isometry makes a probability distribution $(p_1, \ldots, p_n)$ correspond to a point $(\sqrt{p_1}, \ldots, \sqrt{p_n})$ of the sphere.

That’s all standard stuff in information geometry. But notice that the words “full support” snuck in. They mean that none of the probabilities are $0$, so $\Delta_n^\circ$ is an open simplex. If we allowed zero probabilities, it would be a manifold with boundaries, and not an honest manifold.

My impression is that this point requires care. For instance, the book Information Geometry of Ay, Jost, Lê and Schwachhöfer has a section called “Extending the Fisher metric to the boundary” (p. 33), which begins like this:

As is obvious from (2.13) and also from the first fundamental form (2.19), the Fisher metric is not defined at the boundary of the simplex. It is, however, possible to extend the Fisher metric to the boundary…

So even these experts in information geometry define the Fisher metric in a two-step process, first on the interior and then extending to the boundary.

Posted by: Tom Leinster on May 23, 2022 11:25 AM | Permalink | Reply to this

### Re: The Magnitude of Information

Is Tom’s example of relative entropy a good local model for the non-smooth behaviour at the boundary?

The results in our recent preprints assume smoothness at the boundary. However, if for an important class of divergences the boundary behaviour is of a specific type, one should in principle be able to generalise the analysis in our papers to this given type of behaviour. Unfortunately, the technical details promise to be lengthy, so it would be crucial to identify the most relevant local model.

Posted by: Heiko Gimperlein on May 23, 2022 4:44 PM | Permalink | Reply to this

### Towards applications

FWIW, the lowest hanging fruit for applications of magnitude to information geometry (taken broadly) would be to consider a graph with probability distributions on vertices–a setup which has any number of applied instantiations–and then consider divergences a la Kullback-Liebler. These are asymmetric but magnitude doesn’t care. My own experiments some time ago with (co)weightings on vanilla digraphs suggest that these will still pick out boundary-like features in somewhat the same way that Bessel potentials do in the Euclidean case. And this can be useful, for instance there is probably a decent graph drawing algorithm that can be built on these observations.

Posted by: Steve Huntsman on May 23, 2022 1:53 PM | Permalink | Reply to this

### Re: Towards applications

This would seem very much in the spirit of the early works on magnitude for finite metric spaces. It would be very interesting to me to see in examples whether the asymmetry of the divergence leads to interesting new behaviour of the magnitude function.

Posted by: Heiko Gimperlein on May 23, 2022 4:49 PM | Permalink | Reply to this

Post a New Comment