Shannon Entropy from Category Theory

May 2, 2022

Shannon Entropy from Category Theory

Posted by John Baez

$MathML-enabled post (click for more details).$

I’m giving a talk at Categorical Semantics of Entropy on Wednesday May 11th, 2022. You can watch it live on Zoom if you register, or recorded later. Here’s the idea:

Shannon entropy is a powerful concept. But what properties single out Shannon entropy as special? Instead of focusing on the entropy of a probability measure on a finite set, it can help to focus on the “information loss”, or change in entropy, associated with a measure-preserving function. Shannon entropy then gives the only concept of information loss that is functorial, convex-linear and continuous. This is joint work with Tom Leinster and Tobias Fritz.

You can see the slides now, here.

$MathML-enabled post (click for more details).$

It’s fun to talk about work that I did with Tobias Fritz and Tom Leinster here on the $n$ -Café — I’ve never given a talk where I went into as much detail as I will now. In fact I will talk a bit about all these:

• John Baez, Tobias Fritz and Tom Leinster, A characterization of entropy in terms of information loss, 2011.

• Tom Leinster, An operadic introduction to entropy, 2011.

• John Baez and Tobias Fritz, A Bayesian characterization of relative entropy, 2014.

• Tom Leinster, A short characterization of relative entropy, 2017.

• Nicolas Gagné and Prakash Panangaden, A categorical characterization of relative entropy on standard Borel spaces, 2017.

• Tom Leinster, Entropy and Diversity: the Axiomatic Approach, 2020.

• Arthur Parzygnat, A functorial characterization of von Neumann entropy, 2020.

• Arthur Parzygnat, Towards a functorial description of quantum relative entropy, 2021.

• Tai-Danae Bradley, Entropy as a topological operad derivation, 2021.

Posted at May 2, 2022 1:37 AM UTC

TrackBack URL for this Entry: https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/3394

7 Comments & 0 Trackbacks

Matrices

$MathML-enabled post (click for more details).$

It seems a little weird to be working with actual functions between finite sets, rather than matrices where each row is a probability measure. Is this what’s happening when you talk about “convex combinations of functions”? I couldn’t quite parse that slide.

Posted by: Allen Knutson on May 2, 2022 12:21 PM | Permalink | Reply to this

Re: Matrices

$MathML-enabled post (click for more details).$

If my understanding is correct, the notion of a convex linear combination of morphisms is a (well-motivated) “notation trick”. With a different choice of notation it might be less confusing.

To make sense of it in a very wordy way, note that there is a forgetful functor:

(1)

F \colon \mathbf{FinMeas} \longrightarrow \mathbf{FinSet}

acting on objects by $F[(X,p)] = X$ , and a morphism $f \colon (X, p) \to (Y, q)$ in $\mathbf{FinMeas}$ by forgetting the information about $p$ and $q$ and just remembering the function $X \to Y$ , which we denote $F(f)$ .

As a remark: remember that a morphism in FinMeas is simply a morphism in $\mathbf{FinSet}$ that satisfies some condition based on the extra data of the measures. Thus, the forgetful functor above roughly preserves data, but forgets information about condition (which wouldn’t make sense in FinSet anyway).

With that in mind, for any two morphisms $f \colon (X, p) \to (X', p')$ and $g \colon (Y,q) \to (Y', q')$ the convex linear combination: $\lambda f + (1 - \lambda) g$ for any $\lambda \in [0,1]$ is given by the data of its underlying function:

(2)

F(\lambda f + (1 - \lambda) g) = F(f) \coprod F(g): X \coprod Y \longrightarrow X' \coprod Y'

and it is guaranteed to satisfy the appropriate condition that makes it into a morphism from $(X \coprod Y, \lambda p + (1-\lambda) q)$ to $(X' \coprod Y', \lambda p' + (1-\lambda)q')$ . Note that the value of $\lambda$ doesn’t tell us anything about the underlying functions in $\mathbf{FinSet}$ , but tells us about info about what objects we are considering in $\mathbf{FinMeas}$ .

That might clarify things a bit more.

Posted by: Tom Mainiero on May 12, 2022 5:36 PM | Permalink | Reply to this

Re: Matrices

$MathML-enabled post (click for more details).$

To actually answer your question though: I don’t think the convex combination construction of morphisms has to do anything with some linear structures when you interpret functions between finite sets as matrices.

Posted by: Tom Mainiero on May 12, 2022 6:05 PM | Permalink | Reply to this

Re: Matrices

$MathML-enabled post (click for more details).$

Everything Tom just said is true.

Posted by: John Baez on May 13, 2022 8:06 PM | Permalink | Reply to this

Re: Matrices

$MathML-enabled post (click for more details).$

Allan wrote:

It seems a little weird to be working with actual functions between finite sets, rather than matrices where each row is a probability measure.

The latter are called ‘stochastic maps’ and they don’t always decrease entropy. But measure-preserving functions — actual functions between finite sets — always decrease entropy. As I point out, the latter fact is an example of the ‘data processing inequality’. And my slides give this slogan for what’s going on:

Deterministic processing of random data always decreases entropy!

Allan wrote:

Is this what’s happening when you talk about “convex combinations of functions”? I couldn’t quite parse that slide.

Here’s what the slide says:

We can define convex linear combinations of objects in $FinProb$ . For any $0 \le \lambda \le 1$ , let

(1)

\lambda (X,p) \;+ \;(1 - \lambda) (Y,q)

be the disjoint union of $X$ and $Y$ , with the probability distribution given by $\lambda p$ on $X$ and $(1-\lambda)q$ on $Y$ .

We can also define convex linear combinations of morphisms.

(2)

f : (X,p) \to (X',p') , \qquad g : (Y,q) \to (Y', q')

give

(3)

\lambda f + (1-\lambda)g : \lambda (X,p) + (1 - \lambda) (Y,q) \to \lambda (X',p') + (1 - \lambda) (Y',q')

This is simply the function that equals $f$ on $X$ and $g$ on $Y$ .

As I emphasized in my actual talk, the big scary-looking equation (3) is really no big deal: it’s just saying the source and target of a function between finite sets, and its name! And the function itself is very simple: it’s just the function that equals $f$ on the set $X$ and $g$ on the set $Y$ .

The point is that taking convex linear combinations is functorial on $FinProb$ : we can do it both to objects and morphisms, and the usual rules of a functor apply. So what I’m really saying is the $FinProb$ is a category internal to the category of convex spaces.

Posted by: John Baez on May 13, 2022 8:03 PM | Permalink | Reply to this

Slides

$MathML-enabled post (click for more details).$

I don’t see the slides when I click on the link.

Posted by: Mozibur Rahman Ullah on May 11, 2022 7:04 AM | Permalink | Reply to this

Re: Slides

$MathML-enabled post (click for more details).$

I do. But if you go here, you should be able to see the slides and a video both for my talk and Tai-Danae Bradley’s related talk “Operads and entropy”.

Posted by: John Baez on May 13, 2022 7:45 PM | Permalink | Reply to this

The n-Category Café

Skip to the Main Content

May 2, 2022