Skip to the Main Content

Note:These pages make extensive use of the latest XHTML and CSS Standards. They ought to look great in any standards-compliant modern browser. Unfortunately, they will probably look horrible in older browsers, like Netscape 4.x and IE 4.x. Moreover, many posts use MathML, which is, currently only supported in Mozilla. My best suggestion (and you will thank me when surfing an ever-increasing number of sites on the web which have been crafted to use the new standards) is to upgrade to the latest version of your browser. If that's not possible, consider moving to the Standards-compliant and open-source Mozilla browser.

January 22, 2025

Comagnitude 1

Posted by Tom Leinster

In this post and the next, I want to try out a new idea and see where it leads. It goes back to where magnitude began, which was the desire to unify elementary counting formulas like the inclusion-exclusion principle and the simple formula for the number of orbits in a free action of a group on a finite set.

To prepare the ground for comagnitude, I need to present magnitude itself in a slightly different way from usual. I won’t assume you know anything about magnitude, but if you do, watch out for something new: a connection between magnitude and entropy (ordinary, relative and conditional) that I don’t think has quite been articulated before.

The inclusion-exclusion formula tells us the cardinality of a union, which in categorical terms is a pushout. The set of orbits of a free action of a group GG on a finite set is the colimit of the associated functor BGFinSetB G \to FinSet. (Here BGB G is the one-object category whose maps are the elements of GG.) Generally, we’re going to think about the cardinality of the colimit of a functor X:AFinSetX: A \to FinSet, where AA is a finite category.

In both the examples I just mentioned, the cardinality |colimX||colim X| of the colimit is a \mathbb{Q}-linear combination of the cardinalities |X(a)||X(a)| of the individual sets X(a)X(a), taken over all objects aAa \in A. For inclusion-exclusion, we take

A=(012) A = (0 \leftarrow 1 \rightarrow 2)

and get

|colim(X)|=|X(0)||X(1)|+|X(2)|. |colim(X)| = |X(0)| - |X(1)| + |X(2)|.

For a free action of a group GG, we take A=BGA = B G. If the single object of AA is called aa, then

|colim(X)|=|X(a)|/o(G) |colim(X)| = |X(a)|/o(G)

— the cardinality of the set X(a)X(a) being acted on, divided by the order of GG.

Generally, there’s no hope of computing the cardinality of the colimit of a functor X:AFinSetX: A \to FinSet in terms of the cardinalities of the sets X(a)X(a) alone. The colimit usually depends on what the functor XX does to morphisms, as well as objects. So in order to pull this off, we’re going to have to confine ourselves to only those functors XX that are especially convenient. The hope is that for any AA, there are rational numbers w A(a)w_A(a) (aAa \in A) such that for all “convenient” functors X:AFinSetX: A \to FinSet,

|colim(X)|= aAw A(a)|X(a)|. |colim(X)| = \sum_{a \in A} w_A(a) |X(a)|.

Here’s how to simultaneously figure out what the coefficients w A(a)w_A(a) in this weighted sum must be and what “convenient” should mean.

The starting thought is that most notions of “nice enough” for set-valued functors include the representables. Since the colimit of any representable is the one-element set, for the last equation to hold for all representables A(b,)A(b, -) means that

1= aAw A(a)|A(b,a)| 1 = \sum_{a \in A} w_A(a) |A(b, a)|

for all bAb \in A.

And now we’re getting somewhere! This is a system of equations with the same number of equations as unknowns w A(a)w_A(a), namely, the number of objects of AA. And in this situation, there’s typically a unique solution, at least if we work over the field \mathbb{Q}.

A family (w A(a)) aA(w_A(a))_{a \in A} of rational numbers satisfying

aA|A(b,a)|w A(a)=1 \sum_{a \in A} |A(b, a)| w_A(a) = 1

for all bAb \in A is called a weighting on AA, and w A(a)w_A(a) is called the weight of aa. For the purposes of this post, I’ll assume there’s always exactly one weighting on AA. That’s not entirely justified: there are examples of finite categories with no weighting or finitely many. (See Examples 1.11 of my paper The Euler characteristic of a category, where all the stuff I’m talking about right now is worked out.) But it’s safe enough.

So, take the weighting w Aw_A on AA. By defintion, the class of functors X:AFinSetX: A \to FinSet satisfying

|colim(X)|= aAw A(a)|X(a)| |colim(X)| = \sum_{a \in A} w_A(a) |X(a)|

contains the representables. But it’s easy to see that it’s also closed under coproducts. So, this equation — which I’ll call the colimit formula — holds for all coproducts of representables!

Which functors ASetA \to Set are coproducts of representables? They’re sometimes called the “familially representable” functors, the idea being that a coproduct of representables iIA(a i,)\sum_{i \in I} A(a_i, -) is represented by a family of objects (a i) iI(a_i)_{i \in I}. But in my last post, I called them the “free functors”. The way I put it there was that the forgetful functor

Set ASet obA Set^A \to Set^{ob\ A}

has both a left and a right adjoint, and a functor XSet AX \in Set^A is free if and only if it’s in the image of the left adjoint. That means

X aAS(a)×A(a,) X \cong \sum_{a \in A} S(a) \times A(a, -)

for some family of sets (S(a)) aA(S(a))_{a \in A}.

Examples

  • When AA is the discrete two-object category {0,1}\{0, 1\}, both weights are 11, and the general colimit formula

    |colim(X)|= aw A(a)|X(a)| |colim(X)| = \sum_a w_A(a) |X(a)|

    reduces to something very basic:

    |X(0)+X(1)|=|X(0)|+|X(1)| |X(0) + X(1)| = |X(0)| + |X(1)|

    — the cardinality of a coproduct is the sum of the individual cardinalities. When AA is a discrete category, all functors ASetA \to Set are coproducts of representables.

  • When AA is the shape (012)(0 \leftarrow 1 \rightarrow 2) for pushouts, a functor X:AFinSetX: A \to FinSet is a diagram

    (X(0)X(1)X(2)) (X(0) \leftarrow X(1) \rightarrow X(2))

    of finite sets, and XX is a coproduct of representables if and only if both these maps are injections. Assume XX is a coproduct of representables. The weighting on AA is (1,1,1)(1, -1, 1), so the colimit formula says that the cardinality of a pushout is

    |X(0)||X(1)|+|X(2)| |X(0)| - |X(1)| + |X(2)|

    — the inclusion-exclusion formula.

  • For a group GG (or more generally, a monoid), the weight of the unique object of BGB G is 1/o(G)1/o(G), the reciprocal of the order. Given an action of GG on a a finite set, the corresponding functor BGFinSetB G \to FinSet is a coproduct of representables if and only if the action is free. In that case, the colimit formula says that the number of orbits is the cardinality of the set divided by o(G)o(G).

  • Finally, when A=(01)A = (0 \rightrightarrows 1), the colimit formula says that the coequalizer of X(0)X(1)X(0) \rightrightarrows X(1) has |X(1)||X(0)||X(1)| - |X(0)| elements, provided that the two functions from X(0)X(0) to X(1)X(1) are injective with disjoint images.

Aside for category theorists   Under finiteness conditions, and assuming that AA is Cauchy complete, a functor X:ASetX: A \to Set is a coproduct of representables if and only if it is flat with respect to the class of finite connected limits. This might seem more abstract than the original condition, but it actually gives a concrete, testable condition for XX to be a coproduct of representables. (See Lemma 5.2 and Definition 3.2 here.) This result enables one to see rather quickly that in all the examples above, the functors arising as coproducts of representables are exactly what I said they are.

 

Summary so far  We’ve seen that a finite category AA typically has a unique weighting w Aw_A — meaning a family (w A(a)) aA(w_A(a))_{a \in A} of rational numbers satisfying

a|A(a,b)|w A(a)=1 \sum_a |A(a, b)| w_A(a) = 1

for all bAb \in A. And we’ve seen that when X:AFinSetX: A \to FinSet is a coproduct of representables, the “colimit formula” holds:

|colim(X)|= aw A(a)|X(a)|. |colim(X)| = \sum_a w_A(a) |X(a)|.

Now, what if XX isn’t a coproduct of representables? The colimit formula won’t usually hold. But the right-hand side of this non-equation still calculates something. What is it?

Let me define the magnitude |X||X| of a functor X:AFinSetX: A \to FinSet by

|X|= aAw A(a)|X(a)|. |X| = \sum_{a \in A} w_A(a) |X(a)| \in \mathbb{Q}.

This is equal to |colim(X)||colim(X)| if XX is a coproduct of representables, but not in general.

For example, if we take a monoid MM acting on a finite set, the magnitude of the corresponding functor BMFinSetB M \to FinSet is the cardinality of the set divided by the order of MM. Unless the action is free, this isn’t usually the number of orbits, and it might not even be an integer.

The functor Δ1:AFinSet\Delta 1: A \to FinSet sending everything in AA to the one-element set has magnitude

|Δ1|= aAw A(a), |\Delta 1| = \sum_{a \in A} w_A(a),

which is called the magnitude or Euler characteristic of AA. (Under finiteness hypothesis, it’s equal to the topological Euler characteristic of the classifying space in AA.) Hence the magnitude of a category is a special case of the magnitude of a set-valued functor. Conversely, one can show that the magnitude of a functor X:AFinSetX: A \to FinSet is equal to the magnitude of its category of elements.

So, the concepts of magnitude of a category (already studied extensively) and the magnitude of a set-valued functor (a term I think I’m introducing for the first time here) are each special cases of the other. But in these posts, unlike everything I’ve written about magnitude before, I’m going to give prime position to magnitude of set-valued functors rather than of categories.

 

Does this definition of the magnitude of a set-valued functor have sensible properties? It does!

First, if X,Y:AFinSetX, Y: A \to FinSet are isomorphic then certainly |X|=|Y||X| = |Y|. Second, and a bit less trivially, magnitude is invariant under equivalence in the following sense: if we compose X:AFinSetX: A \to FinSet with an equivalence of categories G:BAG: B \to A then

|XG|=|X|. |X \circ G| = |X|.

In fact, this holds just as long as GG has a left adjoint.

The term “magnitude” suggests it should behave in a cardinality-like way, and it does that too. For example, given functors X,Y:AFinSetX, Y: A \to FinSet,

|X+Y|=|X|+|Y|, |X + Y| = |X| + |Y|,

where the left-hand side is the magnitude of the coproduct of our two functors. A bit more generally, suppose we have a diagram of functors and natural transformations

XYZ X \leftarrow Y \rightarrow Z

such that for each aAa \in A, the functions

X(a)Y(a)Z(a) X(a) \leftarrow Y(a) \rightarrow Z(a)

are injective with disjoint images (or more abstractly, the corresponding functor from (012)(0 \leftarrow 1 \rightarrow 2) to FinSetFinSet is a coproduct of representables). Then one can show that the magnitude of the pushout X+ YZX +_Y Z is given by

|X+ YZ|=|X||Y|+|Z|. |X +_Y Z| = |X| - |Y| + |Z|.

So magnitude in this sense obeys the inclusion-exclusion formula. More generally still, there’s a similar result for any shape of colimit, but I won’t write it out here. It’s exactly what the pushout case suggests.

 

I hope I’ve driven home the message that although the colimit formula

|colim(X)|=|X| |colim(X)| = |X|

for functors X:AFinSetX: A \to FinSet holds when XX is a coproduct of representables, it doesn’t hold for arbitrary XX. So perhaps it’s interesting to see how much the two sides of the equation differ when XX isn’t a coproduct of representables. In other words, let’s look at the difference

|colim(X)||X|=|colim(X)| aw A(a)|X(a)|. |colim(X)| - |X| = |colim(X)| - \sum_a w_A(a) |X(a)| \in \mathbb{Q}.

It seems to me that this quantity could be interesting. However, I don’t have a strong feel for what it means yet, so for now I’ll give it a bland name: the discrepancy of XX.

Example  The discrepancy of a coequalizer

X 0X 1C X_0 \rightrightarrows X_1 \to C

is

|C|(|X 1||X 0|)=|X 0||X 1|+|C|. |C| - (|X_1| - |X_0|) = |X_0| - |X_1| + |C|.

We’ve already seen that this is 00 if our two functions f,g:X 0X 1f, g: X_0 \rightrightarrows X_1 are injective with disjoint images. But in general, it looks rather like an Euler characteristic. In fact, from our coequalizer we can form a chain complex of free abelian groups

0X 0gfX 1hC0 0 \to \mathbb{Z} X_0 \stackrel{\mathbb{Z}g - \mathbb{Z}f}{\to} \mathbb{Z} X_1 \stackrel{\mathbb{Z}h}{\to} \mathbb{Z} C \to 0

where hh is the original function X 1CX_1 \to C. Then the discrepancy of our coequalizer is equal to the Euler characteristic of this complex.

This complex is exact except maybe at X 0\mathbb{Z} X_0. Let’s do a little check: if ff and gg are injective with disjoint images then gf\mathbb{Z}g - \mathbb{Z}f is injective too, so the complex is exact everywhere, which implies that the Euler characteristic is 00. Since the discrepancy is also 00 in this case, we’ve just confirmed one case of the result that the discrepancy of the coequalizer is the Euler characteristic of the resulting complex.

Example  For a pushout square

X 1 X 0 X 2 P, \begin{array}{ccc} X_1 &\rightarrow &X_0 \\ \downarrow & &\downarrow \\ X_2 &\rightarrow &P, \end{array}

the discrepancy is

|P|(|X 0||X 1|+|X 2|)=|X 1||X 0||X 2|+|P|. |P| - (|X_0| - |X_1| + |X_2|) = |X_1| - |X_0| - |X_2| + |P|.

I don’t know what to do with this formula, but it has a certain symmetry. Has anyone come across it before?

 

I’ll finish by explaining the entropy connection I promised at the start. Here we go!

 

Ordinary Shannon entropy  For this, we need to think about the Shannon entropy not just of probability measures — the usual setting — but of arbitrary measures. The sets we’re considering measures on will always be finite.

A probability measure pp on a finite set {1,,n}\{1, \ldots, n\} is just an nn-tuple (p 1,,p n)(p_1, \ldots, p_n) of nonnegative reals summing to 11, and its entropy H(p)H(p) is defined by

H(p)= ip ilogp i, H(p) = -\sum_i p_i \log p_i,

with the convention that 0log0=00\ \log\ 0 = 0. A measure a=(a 1,,a n)a = (a_1, \ldots, a_n) on {1,,n}\{1, \ldots, n\} is any nn-tuple of nonnegative reals, and the definition of entropy is extended from probability measures to arbitrary measures by homogeneity. That is, writing a=a i\|a\| = \sum a_i, the tuple a/aa/\|a\| is a probability measure, and we define

H(a)=aH(a/a) H(a) = \|a\| H(a/\|a\|)

(or H(a)=0H(a) = 0 if a=0\|a\| = 0). This is the unique way of extending the definition so that

H(λa)=λH(a) H(\lambda a) = \lambda H(a)

for all measures aa and real λ0\lambda \geq 0.

Let’s now consider measures a=(a 1,,a n)a = (a_1, \ldots, a_n) where each a ia_i is an integer. Such a thing amounts to a map

AπI A \stackrel{\pi}{\to} I

of finite sets. To see this, take I={1,,n}I = \{1, \ldots, n\}, and take AπIA \stackrel{\pi}{\to} I to be a function into II whose ii-fibre has cardinality a ia_i.

This categorification manoeuvre — upgrading from numbers to sets — brings maps into play. In particular, we get the monoid End(A)End(A) of all functions AAA \to A, and its submonoid End I(A)End_I(A) consisting of the endomorphisms of AA over II. That is, the elements of End I(A)End_I(A) are the functions f:AAf: A \to A such that πf=π\pi \circ f = \pi.

The inclusion of monoids

End I(A)End(A), End_I(A) \hookrightarrow End(A),

like any other monoid homomorphism, induces an action of the domain on the underlying set of the codomain: applying an element fEnd I(A)f \in End_I(A) to an element gEnd(A)g \in End(A) gives a new element fgEnd(A)f \circ g \in End(A). And this monoid action corresponds to a functor

End I(A)FinSet. End_I(A) \to FinSet.

(Or “BEnd I(A)FinSetB End_I(A) \to FinSet” if you want to be fussy, but I’ll drop the BBs.)

Question: what’s the magnitude of this functor?

Answer: it’s

|End(A)||End I(A)|=a a ia i a i= i(aa i) a i. \frac{|End(A)|}{|End_I(A)|} = \frac{\|a\|^{\|a\|}}{\prod_i a_i^{a_i}} = \prod_i \biggl( \frac{\|a\|}{a_i} \biggr)^{a_i}.

This is nothing but exp(H(a))\exp(H(a)), the exponential of the entropy of our measure! So:

The exponential of entropy is a special case of magnitude.

(Sometimes it seems that the exponential of entropy is a more fundamental quantity than the entropy itself. Apart from anything else, it doesn’t depend on a choice of base. I like to call the exponential of entropy the diversity.)

At least, that’s true when the measure of each point is an integer. But from knowing the entropy of such measures, we can easily obtain the entropy of those where the entropy of each point is rational, assuming the homogeneity property H(λa)=λH(a)H(\lambda a) = \lambda H(a). And then one can get the completely general case of real measures by extending by continuity.

The basic calculation above was described in an earlier post of mine, but at that time I didn’t see clearly what it meant.

 

Relative entropy  Given two probability measures pp and rr on the same finite set, you can calculate their relative entropy. It’s a measure of how surprised you’d be to observe a distribution of pp in a sample drawn from a population with distribution rr. (I think of rr as the “reference” distribution.) The formula for H(pr)H(p\|r), the entropy of pp relative to rr, is

H(pr)= ip ilog(p i/r i)[0,]. H(p\|r) = \sum_i p_i \log(p_i/r_i) \in [0, \infty].

Just as for ordinary entropy, we can extend it from probability measures to arbitrary measures. We’re now dealing with two measures aa and bb on the same finite set, and I’ll assume they have the same total mass: a=b\|a\| = \|b\|. Again, we extend homogeneously, so that

H(λaλb)=λH(ab) H(\lambda a \| \lambda b) = \lambda H(a\|b)

whenever λ0\lambda \geq 0.

And as before, we’ll focus on measures taking integer values. A pair of measures on a finite set II, both taking integer values and with the same total mass, amounts to a diagram

AI A \rightrightarrows I

in FinSetFinSet. The idea is that if we take I={1,,n}I = \{1, \ldots, n\} and write a ia_i and b ib_i for the cardinalities of the fibres of the ii-fibres of these two maps, then one of our measures is (a 1,,a n)(a_1, \ldots, a_n) and the other is (b 1,,b n)(b_1, \ldots, b_n). They have the same total mass because

a 1++a n=|A|=b 1++b n. a_1 + \cdots + a_n = |A| = b_1 + \cdots + b_n.

Calling our two maps π,ρ:AI\pi, \rho: A \to I, we can consider the set

Hom I(AπI,AρI) Hom_I(A \stackrel{\pi}{\to} I, A \stackrel{\rho}{\to} I)

of functions g:AAg : A \to A such that ρg=π\rho \circ g = \pi. This set is acted on by the monoid

End I(AπI) End_I(A \stackrel{\pi}{\to} I)

of functions f:AAf : A \to A such that πf=π\pi \circ f = \pi, by composition.

As we keep seeing, a monoid action corresponds to a functor from that monoid to SetSet, whose magnitude we can compute. In this case, the magnitude is

|Hom I(AπI,AρI)||End I(AπI)|=b i a ia i=(a ib i) a i, \frac{|Hom_I(A \stackrel{\pi}{\to} I, A \stackrel{\rho}{\to} I)|}{|End_I(A \stackrel{\pi}{\to} I)|} = \frac{\prod b_i^{a_i}}{\prod a_i} = \prod \biggl( \frac{a_i}{b_i} \biggr)^{a_i},

which is equal to

exp(H(ab)) \exp(-H(a \| b))

—the negative exponential of relative entropy.

(We’re never going to be able to get rid of that negative and get exp(H(ab))\exp(H(a \| b)) itself as a magnitude, since H(ab)H(a\|b) can be infinite but that the magnitude of a functor can’t be, at least in the context at hand.)

As for ordinary entropy, if we know relative entropy for integer-valued measures then it’s only two short steps to the definition for arbitrary measures.

 

Conditional entropy  Whereas relative entropy takes as its input a pair of measures on the same set, conditional entropy takes as its input two measures on different sets.

It’s usually presented in terms of random variables: we have random variables XX and YY taking values in sets II and JJ respectively (which for us will be finite), and the conditional entropy H(X|Y)H(X | Y) is defined by

H(X|Y)= iI,jJP(X=i,Y=j)log1P(X=i|Y=j). H(X | Y) = \sum_{i \in I, j \in J} P(X = i, Y = j) \log \frac{1}{P(X = i | Y = j)}.

If we write (p ij) iI,jJ(p_{i j})_{i \in I, j \in J} for the joint distribution — so that the p ijp_{i j} are nonnegative reals summing to 11 — then

H(X|Y)= i,jp ijlog(p jp ij). H(X | Y) = \sum_{i, j} p_{i j} \log \biggl( \frac{p_j}{p_{i j}} \biggr).

Here pp is a probability measure on I×JI \times J. But just as for ordinary and relative entropy, this definition extends to arbitrary measures aa on I×JI \times J, by scaling homogeneously.

To see how conditional entropy is a special case of magnitude, we follow a path that should be familiar by now.

An integer-valued measure on I×JI \times J amounts to a map AI×JA \to I \times J of finite sets, or equivalently a diagram

IAJ I \leftarrow A \rightarrow J

in FinSetFinSet. In the monoid of all endomorphisms of AA, we can consider the endomorphisms over II, or over JJ, or most restrictively of all, over I×JI \times J. In particular, there is an inclusion of monoids

End I×J(A)End J(A). End_{I \times J}(A) \hookrightarrow End_J(A).

As for ordinary entropy, this homomorphism induces an action of the domain on the underlying set of the codomain, giving a functor End I×J(A)FinSetEnd_{I \times J}(A) \to FinSet whose magnitude we can calculate. It’s

|End J(A)||End I×J(A)|= ja j a j i,ja ij a ij= i,j(a ja ij) a ij, \frac{|End_J(A)|}{|End_{I \times J}(A)|} = \frac{\prod_j a_j^{a_j}}{\prod_{i, j} a_{i j}^{a_{i j}}} = \prod_{i, j} \biggl( \frac{a_j}{a_{i j}} \biggr)^{a_{i j}},

where I’m using a ija_{i j} to mean the cardinality of the (i,j)(i, j)-fibre of AI×JA \to I \times J and a ja_j to mean the cardinality of the jj-fibre of AJA \to J. This is exactly the exponential of the conditional entropy.

 

Conclusion   The exponentials of

  • ordinary entropy
  • the negative of relative entropy
  • conditional entropy

all arise as special cases of magnitude — at least for integer-valued measures, but it’s routine to derive the definition for general measures from that special case.

 

If you know lots about entropy, you might be wondering now: what about mutual information? Is that the magnitude of some set-valued functor too?

I don’t know! Given AI×JA \to I \times J, the quantity we’d like to obtain as a magnitude is

|End(A)||End I×J(A)||End I(A)||End J(A)|. \frac{|End(A)| |End_{I \times J}(A)|}{|End_I(A)| |End_J(A)|}.

I can’t see how to do that in any natural way.

Maybe I’ll come back to this point next time, when I’ll start on the topic that’s actually the title of these posts: comagnitude.

Posted at January 22, 2025 9:45 PM UTC

TrackBack URL for this Entry:   https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/3585

2 Comments & 0 Trackbacks

Re: Comagnitude 1

This last formula for mutual information you’d like to obtain as a magnitude looks a lot like the exponential of the discrepancy of a pushout square — but you’ve probably noticed that already?

Posted by: Quentin Aristote on January 23, 2025 10:09 PM | Permalink | Reply to this

Re: Comagnitude 1

Absolutely! Mutual information can be seen as the codiscrepancy of a pullback square, which is what I was intending to mention next time.

I haven’t defined “codiscrepancy” yet, but it’s going to be something multiplicative. Whereas the discrepancy of a pushout square is an alternating sum of the four cardinalities, the codiscrepancy is an alternating product.

In terms of this kind of thing, discrepancy is more like Euler characteristic and codiscrepancy is more like homotopy cardinality.

I’ll get on to this properly next time. It’s a nice observation, but I’m still quite puzzled by the whole situation with mutual information and magnitude.

Posted by: Tom Leinster on January 23, 2025 10:24 PM | Permalink | Reply to this

Post a New Comment