Category-Theoretic Characterizations of Entropy
Posted by John Baez
Tobias Fritz, Tom Leinster and I seem to be writing a paper on category-theoretic characterizations of entropy—not only the good old Shannon entropy, but also the Rényi entropy , which depends on a parameter and reduces to the Shannon entropy as . Curiously, you can’t get the Shannon entropy by just sticking in the definition of Rényi entropy: you really need to sneak up on it, taking a limit as .
Tobias has worked on the category whose morphisms are stochastic matrices, and a category-theoretic approach to convex sets. Tom likes category-theoretic characterizations of familiar concepts. So it was natural for them to become interested in this topic. But for me, it started by trying to understand Rényi entropy.
A while back I wrote a paper on Rényi entropy and free energy and blogged about it here. I got a bunch of very helpful feedback, which improved the paper immensely and allowed me state the main result in this way: the Rényi entropy is a ‘-deformed version’ of the ordinary entropy. The ordinary Shannon entropy can be defined as minus the derivative of free energy with respect to temperature . To get the Rényi entropy, we must use a generalization of the usual derivative, called a ‘q-derivative’:
To be completely honest, this should be called a ‘-derivative’. But that’s no big deal. What’s more interesting is that now it’s obvious why taking the limit is a good idea: that’s how -derivatives reduce to ordinary derivatives!
Even more interesting is that -derivatives show up throughout the theory of quantum groups, the combinatorics of finite fields, and elsewhere. Does that shed light on why they’re showing up here too? I wish I knew.
To better understand the connection between -derivatives and Rényi entropy, it would help to understand what’s important—if anything—about Rényi entropy. It’s not exactly a central concept in statistical mechanics. As far as I can tell, it’s more of a curiosity. But there are a few interesting results about it here and there, and one of the more interesting can be found in Rényi’s original paper:
- A. Rényi, On measures of information and entropy, in Proceedings of the 4th Berkeley Symposium on Mathematics, Statistics and Probability 1960, pp. 547–561. Also freely available online via Project Euclid.
This paper starts with a nice characterization of Shannon entropy due to Fadeev, and then Theorem 3 gives a characterization of Rényi entropy.
Now Tom Leinster has taken Fadeev’s characterization of Shannon entropy and stated it in beautiful category-theoretic terms. Tobias Fritz suggested a possible avenue toward doing something similar with Rényi entropy, but we’re not there yet. We were discussing this on my other blog, but when Tom uttered the word ‘2-monad’, I realized it was time to move the conversation here.
Most of our ideas so far are recorded here:
If you stare at this stuff carefully, you’ll notice an interesting ‘level shift’ is at work, both in our treatment of:
• finite probability measure spaces (which have entropies),
and:
• nonnegative real numbers (which are entropies: in other words, entropy takes values in ).
For the nonnegative real numbers, this level shift goes like this:
- Sometimes we think of as a category with objects in and a unique morphism from to iff .
- Sometimes we think of as a category with one object, with morphisms in and composition given by addition.
Of course there’s no real conflict: we can combine these viewpoints by working with a monoidal category with as its set of objects, a unique morphism from to iff , and addition as the tensor product. Indeed we sometimes want to get multiplication into the game too, and then we get a ‘rig category’. And sometimes we want to use the topology on too. Then we’ve got a rig category internal to .
All this is bog standard1: it’s just a fancy way of thinking about as a topological ordered rig.
For finite probability measure spaces, the corresponding level shift is a bit more murky in my mind, and that’s what I’d like to start by talking about. I’m hoping it works in a very analogous way! But let’s see:
- Sometimes we work with the category where finite probability measure spaces are objects and measure-preserving maps are morphisms.
- Sometimes we work with an operad, the ‘Giry operad’, where the set of -ary operations is the set of probability measures on the -point set.
Of course this set is also known as the -simplex, so all of a sudden instead of doing statistical mechanics we’re in the simplicial world that homotopy theorists like to inhabit! Maybe these worlds are connected more intimately than people suspect. But never mind that for now. My question is:
What’s a nice way to combine viewpoints 1. and 2. for finite probability measure spaces, which is closely analogous to something that also works for ?
We could then hope to say that the Shannon entropy is a map
that preserves a lot of the structures that and have in common. Indeed, it might be uniquely characterized this way, based on Tom’s reinterpretation of Fadeev’s theorem.
You’ll notice an irksome apparent disanalogy between what I did with and what I’m doing with finite probability measure spaces: in the second case, and only in that case, do we see operads entering the game.
But that’s probably because things haven’t been cleaned up enough yet.
I think that this is one of those puzzles where finally having had the wits to state it, I should be able to solve it. Anyway, this blog entry is mainly meant as a place where Tobias, Tom and interested bystanders can join me in discussing our paper.
1 I love it when Brits say ‘bog standard’, so I wanted to see if I could pull it off convincingly. How did I do? Apparently England is so wet that bogs are considered ‘standard’. I just like the way it sounds: curt and dismissive.
Re: Category-Theoretic Characterizations of Entropy
So, to start answering my own question, let me cook up a monoidal category that’s related to probability measure spaces but also to the ‘Giry operad’, with a monoidal structure somehow analogous to addition in .
Whenever we have any monoidal category and any object we get an operad, the co-concrete operad of , whose set of -ary operations is .
So, let’s create a category where the objects are finite sets and the morphisms are ‘stochastic maps’. More precisely, a morphism in this category is function where is the space of probability measures on . This functor is part of a monad called the ‘Giry monad’ and our ability to compose morphisms arises from this fact.
In jargonesque terms, we’re forming the full subcategory of the Kleisli category of Giry monad, where the objects are just finite sets.
Let’s hope this category becomes monoidal with disjoint union of sets as the tensor product.
Then the set of morphisms from to is the set of probability measures on the -element set. So, the co-concrete operad of the 1-element set seems to be the Giry operad!
The appearance of the ‘co’ here is a bit curious, but otherwise I like this construction.