Comagnitude 1
Posted by Tom Leinster
In this post and the next, I want to try out a new idea and see where it leads. It goes back to where magnitude began, which was the desire to unify elementary counting formulas like the inclusion-exclusion principle and the simple formula for the number of orbits in a free action of a group on a finite set.
To prepare the ground for comagnitude, I need to present magnitude itself in a slightly different way from usual. I won’t assume you know anything about magnitude, but if you do, watch out for something new: a connection between magnitude and entropy (ordinary, relative and conditional) that I don’t think has quite been articulated before.
The inclusion-exclusion formula tells us the cardinality of a union, which in categorical terms is a pushout. The set of orbits of a free action of a group on a finite set is the colimit of the associated functor . (Here is the one-object category whose maps are the elements of .) Generally, we’re going to think about the cardinality of the colimit of a functor , where is a finite category.
In both the examples I just mentioned, the cardinality of the colimit is a -linear combination of the cardinalities of the individual sets , taken over all objects . For inclusion-exclusion, we take
and get
For a free action of a group , we take . If the single object of is called , then
— the cardinality of the set being acted on, divided by the order of .
Generally, there’s no hope of computing the cardinality of the colimit of a functor in terms of the cardinalities of the sets alone. The colimit usually depends on what the functor does to morphisms, as well as objects. So in order to pull this off, we’re going to have to confine ourselves to only those functors that are especially convenient. The hope is that for any , there are rational numbers () such that for all “convenient” functors ,
Here’s how to simultaneously figure out what the coefficients in this weighted sum must be and what “convenient” should mean.
The starting thought is that most notions of “nice enough” for set-valued functors include the representables. Since the colimit of any representable is the one-element set, for the last equation to hold for all representables means that
for all .
And now we’re getting somewhere! This is a system of equations with the same number of equations as unknowns , namely, the number of objects of . And in this situation, there’s typically a unique solution, at least if we work over the field .
A family of rational numbers satisfying
for all is called a weighting on , and is called the weight of . For the purposes of this post, I’ll assume there’s always exactly one weighting on . That’s not entirely justified: there are examples of finite categories with no weighting or finitely many. (See Examples 1.11 of my paper The Euler characteristic of a category, where all the stuff I’m talking about right now is worked out.) But it’s safe enough.
So, take the weighting on . By defintion, the class of functors satisfying
contains the representables. But it’s easy to see that it’s also closed under coproducts. So, this equation — which I’ll call the colimit formula — holds for all coproducts of representables!
Which functors are coproducts of representables? They’re sometimes called the “familially representable” functors, the idea being that a coproduct of representables is represented by a family of objects . But in my last post, I called them the “free functors”. The way I put it there was that the forgetful functor
has both a left and a right adjoint, and a functor is free if and only if it’s in the image of the left adjoint. That means
for some family of sets .
Examples
When is the discrete two-object category , both weights are , and the general colimit formula
reduces to something very basic:
— the cardinality of a coproduct is the sum of the individual cardinalities. When is a discrete category, all functors are coproducts of representables.
When is the shape for pushouts, a functor is a diagram
of finite sets, and is a coproduct of representables if and only if both these maps are injections. Assume is a coproduct of representables. The weighting on is , so the colimit formula says that the cardinality of a pushout is
— the inclusion-exclusion formula.
For a group (or more generally, a monoid), the weight of the unique object of is , the reciprocal of the order. Given an action of on a a finite set, the corresponding functor is a coproduct of representables if and only if the action is free. In that case, the colimit formula says that the number of orbits is the cardinality of the set divided by .
Finally, when , the colimit formula says that the coequalizer of has elements, provided that the two functions from to are injective with disjoint images.
Aside for category theorists Under finiteness conditions, and assuming that is Cauchy complete, a functor is a coproduct of representables if and only if it is flat with respect to the class of finite connected limits. This might seem more abstract than the original condition, but it actually gives a concrete, testable condition for to be a coproduct of representables. (See Lemma 5.2 and Definition 3.2 here.) This result enables one to see rather quickly that in all the examples above, the functors arising as coproducts of representables are exactly what I said they are.
Summary so far We’ve seen that a finite category typically has a unique weighting — meaning a family of rational numbers satisfying
for all . And we’ve seen that when is a coproduct of representables, the “colimit formula” holds:
Now, what if isn’t a coproduct of representables? The colimit formula won’t usually hold. But the right-hand side of this non-equation still calculates something. What is it?
Let me define the magnitude of a functor by
This is equal to if is a coproduct of representables, but not in general.
For example, if we take a monoid acting on a finite set, the magnitude of the corresponding functor is the cardinality of the set divided by the order of . Unless the action is free, this isn’t usually the number of orbits, and it might not even be an integer.
The functor sending everything in to the one-element set has magnitude
which is called the magnitude or Euler characteristic of . (Under finiteness hypothesis, it’s equal to the topological Euler characteristic of the classifying space in .) Hence the magnitude of a category is a special case of the magnitude of a set-valued functor. Conversely, one can show that the magnitude of a functor is equal to the magnitude of its category of elements.
So, the concepts of magnitude of a category (already studied extensively) and the magnitude of a set-valued functor (a term I think I’m introducing for the first time here) are each special cases of the other. But in these posts, unlike everything I’ve written about magnitude before, I’m going to give prime position to magnitude of set-valued functors rather than of categories.
Does this definition of the magnitude of a set-valued functor have sensible properties? It does!
First, if are isomorphic then certainly . Second, and a bit less trivially, magnitude is invariant under equivalence in the following sense: if we compose with an equivalence of categories then
In fact, this holds just as long as has a left adjoint.
The term “magnitude” suggests it should behave in a cardinality-like way, and it does that too. For example, given functors ,
where the left-hand side is the magnitude of the coproduct of our two functors. A bit more generally, suppose we have a diagram of functors and natural transformations
such that for each , the functions
are injective with disjoint images (or more abstractly, the corresponding functor from to is a coproduct of representables). Then one can show that the magnitude of the pushout is given by
So magnitude in this sense obeys the inclusion-exclusion formula. More generally still, there’s a similar result for any shape of colimit, but I won’t write it out here. It’s exactly what the pushout case suggests.
I hope I’ve driven home the message that although the colimit formula
for functors holds when is a coproduct of representables, it doesn’t hold for arbitrary . So perhaps it’s interesting to see how much the two sides of the equation differ when isn’t a coproduct of representables. In other words, let’s look at the difference
It seems to me that this quantity could be interesting. However, I don’t have a strong feel for what it means yet, so for now I’ll give it a bland name: the discrepancy of .
Example The discrepancy of a coequalizer
is
We’ve already seen that this is if our two functions are injective with disjoint images. But in general, it looks rather like an Euler characteristic. In fact, from our coequalizer we can form a chain complex of free abelian groups
where is the original function . Then the discrepancy of our coequalizer is equal to the Euler characteristic of this complex.
This complex is exact except maybe at . Let’s do a little check: if and are injective with disjoint images then is injective too, so the complex is exact everywhere, which implies that the Euler characteristic is . Since the discrepancy is also in this case, we’ve just confirmed one case of the result that the discrepancy of the coequalizer is the Euler characteristic of the resulting complex.
Example For a pushout square
the discrepancy is
I don’t know what to do with this formula, but it has a certain symmetry. Has anyone come across it before?
I’ll finish by explaining the entropy connection I promised at the start. Here we go!
Ordinary Shannon entropy For this, we need to think about the Shannon entropy not just of probability measures — the usual setting — but of arbitrary measures. The sets we’re considering measures on will always be finite.
A probability measure on a finite set is just an -tuple of nonnegative reals summing to , and its entropy is defined by
with the convention that . A measure on is any -tuple of nonnegative reals, and the definition of entropy is extended from probability measures to arbitrary measures by homogeneity. That is, writing , the tuple is a probability measure, and we define
(or if ). This is the unique way of extending the definition so that
for all measures and real .
Let’s now consider measures where each is an integer. Such a thing amounts to a map
of finite sets. To see this, take , and take to be a function into whose -fibre has cardinality .
This categorification manoeuvre — upgrading from numbers to sets — brings maps into play. In particular, we get the monoid of all functions , and its submonoid consisting of the endomorphisms of over . That is, the elements of are the functions such that .
The inclusion of monoids
like any other monoid homomorphism, induces an action of the domain on the underlying set of the codomain: applying an element to an element gives a new element . And this monoid action corresponds to a functor
(Or “” if you want to be fussy, but I’ll drop the s.)
Question: what’s the magnitude of this functor?
Answer: it’s
This is nothing but , the exponential of the entropy of our measure! So:
The exponential of entropy is a special case of magnitude.
(Sometimes it seems that the exponential of entropy is a more fundamental quantity than the entropy itself. Apart from anything else, it doesn’t depend on a choice of base. I like to call the exponential of entropy the diversity.)
At least, that’s true when the measure of each point is an integer. But from knowing the entropy of such measures, we can easily obtain the entropy of those where the entropy of each point is rational, assuming the homogeneity property . And then one can get the completely general case of real measures by extending by continuity.
The basic calculation above was described in an earlier post of mine, but at that time I didn’t see clearly what it meant.
Relative entropy Given two probability measures and on the same finite set, you can calculate their relative entropy. It’s a measure of how surprised you’d be to observe a distribution of in a sample drawn from a population with distribution . (I think of as the “reference” distribution.) The formula for , the entropy of relative to , is
Just as for ordinary entropy, we can extend it from probability measures to arbitrary measures. We’re now dealing with two measures and on the same finite set, and I’ll assume they have the same total mass: . Again, we extend homogeneously, so that
whenever .
And as before, we’ll focus on measures taking integer values. A pair of measures on a finite set , both taking integer values and with the same total mass, amounts to a diagram
in . The idea is that if we take and write and for the cardinalities of the fibres of the -fibres of these two maps, then one of our measures is and the other is . They have the same total mass because
Calling our two maps , we can consider the set
of functions such that . This set is acted on by the monoid
of functions such that , by composition.
As we keep seeing, a monoid action corresponds to a functor from that monoid to , whose magnitude we can compute. In this case, the magnitude is
which is equal to
—the negative exponential of relative entropy.
(We’re never going to be able to get rid of that negative and get itself as a magnitude, since can be infinite but that the magnitude of a functor can’t be, at least in the context at hand.)
As for ordinary entropy, if we know relative entropy for integer-valued measures then it’s only two short steps to the definition for arbitrary measures.
Conditional entropy Whereas relative entropy takes as its input a pair of measures on the same set, conditional entropy takes as its input two measures on different sets.
It’s usually presented in terms of random variables: we have random variables and taking values in sets and respectively (which for us will be finite), and the conditional entropy is defined by
If we write for the joint distribution — so that the are nonnegative reals summing to — then
Here is a probability measure on . But just as for ordinary and relative entropy, this definition extends to arbitrary measures on , by scaling homogeneously.
To see how conditional entropy is a special case of magnitude, we follow a path that should be familiar by now.
An integer-valued measure on amounts to a map of finite sets, or equivalently a diagram
in . In the monoid of all endomorphisms of , we can consider the endomorphisms over , or over , or most restrictively of all, over . In particular, there is an inclusion of monoids
As for ordinary entropy, this homomorphism induces an action of the domain on the underlying set of the codomain, giving a functor whose magnitude we can calculate. It’s
where I’m using to mean the cardinality of the -fibre of and to mean the cardinality of the -fibre of . This is exactly the exponential of the conditional entropy.
Conclusion The exponentials of
- ordinary entropy
- the negative of relative entropy
- conditional entropy
all arise as special cases of magnitude — at least for integer-valued measures, but it’s routine to derive the definition for general measures from that special case.
If you know lots about entropy, you might be wondering now: what about mutual information? Is that the magnitude of some set-valued functor too?
I don’t know! Given , the quantity we’d like to obtain as a magnitude is
I can’t see how to do that in any natural way.
Maybe I’ll come back to this point next time, when I’ll start on the topic that’s actually the title of these posts: comagnitude.
Re: Comagnitude 1
This last formula for mutual information you’d like to obtain as a magnitude looks a lot like the exponential of the discrepancy of a pushout square — but you’ve probably noticed that already?