In this post and the next, I want to try out a new idea and see where it
leads. It goes back to where magnitude began, which was the desire to unify
elementary counting formulas like the inclusion-exclusion principle and the
simple formula for the number of orbits in a free action of a group on a
finite set.
To prepare the ground for comagnitude, I need to present magnitude itself
in a slightly different way from usual. I won’t assume you know anything
about magnitude, but if you do, watch out for something new: a connection
between magnitude and entropy (ordinary, relative and conditional) that I
don’t think has quite been articulated before.
The inclusion-exclusion formula tells us the cardinality of a union, which
in categorical terms is a pushout. The set of orbits of a free action of a
group on a finite set is the colimit of the associated functor . (Here is the one-object category whose maps are the elements of
.) Generally, we’re going to think about the cardinality of the colimit
of a functor , where is a finite category.
In both the examples I just mentioned, the cardinality of the
colimit is a -linear combination of the cardinalities
of the individual sets , taken over all objects . For
inclusion-exclusion, we take
and get
For a free action of a group , we take . If the single object of
is called , then
— the cardinality of the set being acted on, divided by the
order of .
Generally, there’s no hope of computing the cardinality of the colimit of a
functor in terms of the cardinalities of the sets
alone. The colimit usually depends on what the functor does to
morphisms, as well as objects. So in order to pull this off, we’re going to
have to confine ourselves to only those functors that are especially
convenient. The hope is that for any , there are rational numbers
() such that for all “convenient” functors ,
Here’s how to simultaneously figure out what the coefficients in
this weighted sum must be and what “convenient” should mean.
The starting thought is that most notions of “nice enough” for set-valued
functors include the representables. Since the colimit of any representable
is the one-element set, for the last equation to hold for all
representables means that
for all .
And now we’re getting somewhere! This is a system of equations with the
same number of equations as unknowns , namely, the number of
objects of . And in this situation, there’s typically a unique solution, at least if we work over the field .
A family of rational numbers satisfying
for all is called a weighting on , and is called
the weight of . For the purposes of this post, I’ll assume there’s
always exactly one weighting on . That’s not entirely justified: there
are examples of finite categories with no weighting or finitely many. (See
Examples 1.11 of my paper The Euler characteristic of a
category, where all the stuff I’m talking
about right now is worked out.) But it’s safe enough.
So, take the weighting on . By defintion, the class of functors
satisfying
contains the representables. But it’s easy to see that it’s also closed
under coproducts. So, this equation — which I’ll call the colimit
formula — holds for all coproducts of representables!
Which functors are coproducts of representables? They’re
sometimes called the “familially representable” functors, the idea being
that a coproduct of representables is
represented by a family of objects . But in my last
post,
I called them the “free functors”. The way I put it there was that the
forgetful functor
has both a left and a right adjoint, and a functor is free if
and only if it’s in the image of the left adjoint. That means
for some family of sets .
Examples
When is the discrete two-object category , both weights are
, and the general colimit formula
reduces to something very basic:
— the cardinality of a coproduct is the sum of the individual
cardinalities. When is a discrete category, all functors are coproducts
of representables.
When is the shape for pushouts, a
functor is a diagram
of finite sets, and is a coproduct of representables if and only if
both these maps are
injections. Assume
is a coproduct of representables. The
weighting on is , so the colimit formula says that the
cardinality of a pushout is
— the inclusion-exclusion formula.
For a group (or more generally, a monoid), the weight of the unique
object of is , the reciprocal of the order. Given an action
of on a a finite set, the corresponding functor is a
coproduct of representables if and only if the action is free. In that
case, the colimit formula says that the number of orbits is the
cardinality of the set divided by .
Finally, when , the colimit formula says
that the coequalizer of has elements, provided that the two functions from to
are injective with disjoint images.
Aside for category theorists Under finiteness conditions, and
assuming that is Cauchy complete, a functor is a
coproduct of representables if and only if it is flat with respect to
the class of finite connected limits. This might seem more abstract
than the original condition, but it actually gives a concrete, testable
condition for to be a coproduct of representables. (See Lemma 5.2
and Definition 3.2
here.) This result enables one to see rather quickly
that in all the examples above, the functors arising as coproducts of
representables are exactly what I said they are.
Summary so far We’ve seen that a finite category typically has
a unique weighting — meaning a family of
rational numbers satisfying
for all . And we’ve seen that when is a
coproduct of representables, the “colimit formula” holds:
Now, what if isn’t a coproduct of representables? The colimit formula
won’t usually hold. But the right-hand side of this non-equation still
calculates something. What is it?
Let me define the magnitude of a functor by
This is equal to if is a coproduct of representables, but
not in general.
For example, if we take a monoid acting on a finite set, the magnitude
of the corresponding functor is the cardinality of the set
divided by the order of . Unless the action is free, this isn’t usually
the number of orbits, and it might not even be an integer.
The functor sending everything in to the
one-element set has magnitude
which is called the magnitude or Euler characteristic of
. (Under finiteness hypothesis, it’s equal to the topological Euler
characteristic of the classifying space in .) Hence the magnitude of a
category is a special case of the magnitude of a set-valued
functor. Conversely, one can show that the magnitude of a functor is equal to the magnitude of its category of elements.
So, the concepts of magnitude of a category (already studied
extensively) and the magnitude of
a set-valued functor (a term I think I’m introducing for the first
time here) are each special cases of the other. But in these posts, unlike
everything I’ve written about magnitude before, I’m going to give prime
position to magnitude of set-valued functors rather than of categories.
Does this definition of the magnitude of a set-valued functor have sensible
properties? It does!
First, if are isomorphic then certainly . Second, and a bit less trivially, magnitude is invariant under
equivalence in the following sense: if we compose with an
equivalence of categories then
In fact, this holds just as long as has a left adjoint.
The term “magnitude” suggests it should behave in a cardinality-like way,
and it does that too. For example, given functors ,
where the left-hand side is the magnitude of the coproduct of our two
functors. A bit more generally, suppose we have a diagram of functors and
natural transformations
such that for each , the functions
are injective with disjoint images (or more abstractly,
the corresponding functor from to
is a coproduct of representables). Then one can show that the magnitude of
the pushout is given by
So magnitude in this sense obeys the inclusion-exclusion formula.
More generally still, there’s a similar result for any shape of colimit, but I won’t write it out here. It’s exactly what
the pushout case suggests.
I hope I’ve driven home the message that although the colimit formula
for functors holds when is a coproduct of
representables, it doesn’t hold for arbitrary . So perhaps it’s
interesting to see how much the two sides of the equation differ when
isn’t a coproduct of representables. In other words, let’s look at the
difference
It seems to me that this quantity could be interesting. However, I don’t have a
strong feel for what it means yet, so for now I’ll give it a bland name: the
discrepancy of .
Example The discrepancy of a coequalizer
is
We’ve already seen that this is if our two functions are injective with disjoint images. But in general,
it looks rather like an Euler characteristic. In fact, from our coequalizer
we can form a chain complex of free abelian groups
where is the original function . Then the discrepancy of our coequalizer is equal to the Euler characteristic of this complex.
This complex is exact except maybe at . Let’s do a little
check: if and are injective with disjoint images then is injective too, so the complex is exact everywhere, which implies that the Euler characteristic is . Since the discrepancy is also
in this case, we’ve just confirmed one case of the result that the discrepancy of the coequalizer is the Euler characteristic of the resulting complex.
Example For a pushout square
the discrepancy is
I don’t know what to do with this formula, but it has a certain
symmetry. Has anyone come across it before?
I’ll finish by explaining the entropy connection I promised at
the start. Here we go!
Ordinary Shannon entropy For this, we need to think about the
Shannon entropy not just of probability measures — the usual
setting — but of arbitrary measures. The sets we’re considering measures on will always be finite.
A probability measure on a finite set is just an -tuple of nonnegative reals summing to , and
its entropy is defined by
with the convention that . A measure
on is any -tuple of nonnegative reals, and the
definition of entropy is extended from probability measures to arbitrary
measures by homogeneity. That is, writing , the tuple is a
probability measure, and we define
(or if ). This is the unique way of extending the
definition so that
for all measures and real .
Let’s now consider measures where each is an
integer. Such a thing amounts to a map
of finite sets. To see this, take , and take to be a function into whose -fibre has
cardinality .
This categorification manoeuvre — upgrading from numbers to sets
— brings maps into play. In particular, we get the monoid of
all functions , and its submonoid consisting of
the endomorphisms of over . That is, the elements of are
the functions such that .
The inclusion of monoids
like any other monoid homomorphism, induces an action of the domain on the
underlying set of the codomain: applying an element to an
element gives a new element . And this
monoid action corresponds to a functor
(Or “” if you want to be fussy, but I’ll drop the
s.)
Question: what’s the magnitude of this functor?
Answer: it’s
This is nothing but , the exponential of the entropy of our
measure! So:
The exponential of entropy is a special case of magnitude.
(Sometimes it seems that the exponential of entropy is a more fundamental quantity than the entropy itself. Apart from anything else, it doesn’t depend on a choice of base. I like to call the exponential of entropy the diversity.)
At least, that’s true when the measure of each point is an integer. But
from knowing the entropy of such measures, we can easily obtain the entropy
of those where the entropy of each point is rational, assuming the
homogeneity property . And then one can get the
completely general case of real measures by extending by continuity.
The basic calculation above was described in an earlier
post
of mine, but at that time I didn’t see clearly what it meant.
Relative entropy Given two probability measures and on
the same finite set, you can calculate their relative entropy. It’s a
measure of how surprised you’d be to observe a distribution of in a
sample drawn from a population with distribution . (I think of as
the “reference” distribution.) The formula for , the entropy of
relative to , is
Just as for ordinary entropy, we can extend it from probability measures to arbitrary measures. We’re now dealing with two measures and on the same
finite set, and I’ll assume they have the same total mass: . Again, we extend homogeneously, so that
whenever .
And as before, we’ll focus on measures taking integer values. A pair of
measures on a finite set , both taking integer values and with the same
total mass, amounts to a diagram
in . The idea is that if we take and
write and for the cardinalities of the fibres of the -fibres
of these two maps, then one of our measures is and the
other is . They have the same total mass because
Calling our two maps , we can consider the set
of functions such that . This set is
acted on by the monoid
of functions such that , by
composition.
As we keep seeing, a monoid action corresponds to a functor from that
monoid to , whose magnitude we can compute. In this case, the
magnitude is
which is equal to
—the negative exponential of relative entropy.
(We’re never going to be able to get rid of that negative and get itself as a magnitude, since can be infinite but that the
magnitude of a functor can’t be, at least in the context at hand.)
As for ordinary entropy, if we know relative entropy for integer-valued
measures then it’s only two short steps to the definition for arbitrary
measures.
Conditional entropy Whereas relative entropy takes as its input a
pair of measures on the same set, conditional entropy takes as its input
two measures on different sets.
It’s usually presented in terms of random variables: we have random
variables and taking values in sets and respectively (which
for us will be finite), and the conditional entropy is defined
by
If we write for the joint distribution
— so that the are nonnegative reals summing to —
then
Here is a probability measure on . But just as for ordinary
and relative entropy, this definition extends to arbitrary measures on
, by scaling homogeneously.
To see how conditional entropy is a special case of magnitude, we follow a
path that should be familiar by now.
An integer-valued measure on amounts to a map of finite sets, or equivalently a diagram
in . In the monoid of all endomorphisms of , we can consider the
endomorphisms over , or over , or most restrictively of all, over . In particular, there is an inclusion of monoids
As for ordinary entropy, this homomorphism induces an action of the domain
on the underlying set of the codomain, giving a functor whose magnitude we can calculate. It’s
where I’m using to mean the cardinality of the -fibre of
and to mean the cardinality of the -fibre of . This is exactly the exponential of the conditional entropy.
Conclusion The exponentials of
- ordinary entropy
- the negative of relative entropy
- conditional entropy
all arise as special cases of magnitude — at least for
integer-valued measures, but it’s routine to derive the definition for
general measures from that special case.
If you know lots about entropy, you might be wondering now: what about
mutual information? Is that the magnitude of some set-valued functor too?
I don’t know! Given , the quantity we’d like to obtain as
a magnitude is
I can’t see how to do that in any natural way.
Maybe I’ll come back to this point next time, when I’ll start on the
topic that’s actually the title of these posts: comagnitude.