Skip to the Main Content

Note:These pages make extensive use of the latest XHTML and CSS Standards. They ought to look great in any standards-compliant modern browser. Unfortunately, they will probably look horrible in older browsers, like Netscape 4.x and IE 4.x. Moreover, many posts use MathML, which is, currently only supported in Mozilla. My best suggestion (and you will thank me when surfing an ever-increasing number of sites on the web which have been crafted to use the new standards) is to upgrade to the latest version of your browser. If that's not possible, consider moving to the Standards-compliant and open-source Mozilla browser.

December 4, 2020

Entropy and Diversity: The Axiomatic Approach

Posted by Tom Leinster

As Emily was kind enough to point out earlier, my new book is out on the arXiv!

It’s arXived by agreement with the wonderful Cambridge University Press, who will publish it in April 2021. You can pre-order it now, as the ideal festive gift for any friend who enjoys deferred gratification.

Cover of book

You can read some blurb about the book on my page about it or CUP’s page.

I’ve been writing about entropy and diversity on this blog for nearly ten years now. The book collects together and connects together many of the stories I’ve told.

One of the reasons I wanted to write a book is to show how all these stories fit together, which can be much harder to discern from scattered blog posts over a long period. So if you’ve seen my posts on entropy or diversity and not really known what to make of it all, perhaps the book will provide an answer.

Here’s the list of chapters, with links to some related Café posts (plus one from Azimuth) — often from back when my ideas were in a less developed form.

  1. Fundamental functional equations

  2. Shannon entropy

  3. Relative entropy

  4. Deformations of Shannon entropy

  5. Means

  6. Species similarity and magnitude

  7. Value

  8. Mutual information and metacommunities

  9. Probabilistic methods

  10. Information loss

  11. Entropy modulo a prime

  12. The categorical origins of entropy

Finally, a puzzle: what’s shown on the cover of the book?

Posted at December 4, 2020 2:29 PM UTC

TrackBack URL for this Entry:

14 Comments & 0 Trackbacks

Re: Entropy and Diversity: The Axiomatic Approach

The Great Blue Hole

Posted by: AB on December 4, 2020 3:38 PM | Permalink | Reply to this

Re: Entropy and Diversity: The Axiomatic Approach


As the linked Wikipedia article says, it’s a giant marine sinkhole off the coast of Belize. It was a favourite diving spot of Jacques Cousteau. Putting it on the cover was the excellent idea of CUP editor Roger Astley.

Here’s a photo from the inside:

Great Blue Hole, interior

This photo makes it look much smaller than it is. It’s actually over 300m in diameter.

Posted by: Tom Leinster on December 4, 2020 4:08 PM | Permalink | Reply to this

Re: Entropy and Diversity: The Axiomatic Approach

Congrats Tom and sorry for preempting you with the other post. I just couldn’t contain my enthusiasm!

Posted by: Emily Riehl on December 4, 2020 4:17 PM | Permalink | Reply to this

Re: Entropy and Diversity: The Axiomatic Approach

No no, it was really nice! Today’s a day of celebration for me, and your post made me feel more celebratory still.

Posted by: Tom Leinster on December 4, 2020 4:19 PM | Permalink | Reply to this

Re: Entropy and Diversity: The Axiomatic Approach

Many congratulations, Tom. I had to teach the Yoneda lemma today and looked up your “Basic Category Theory”. The start of Chapter 4 made my day: “A category is a world of objects, all looking at one another. Each sees the world from a different viewpoint”.

I look forward to read equally inspiring explanations in the new book!

Posted by: Nicola Gambino on December 4, 2020 9:59 PM | Permalink | Reply to this

Re: Entropy and Diversity: The Axiomatic Approach

Thank you very much, Nicola!

Posted by: Tom Leinster on December 5, 2020 4:53 PM | Permalink | Reply to this

Entropy and Diversity

Beautiful book Tom! 400 plus pages, but it is so nicely organized and readable that I felt immediately drawn in rather than intimidated. It probably helps that I’ve followed many of the posts here about the papers that the book covers, but everything I’ve read so far just flows as if the reader could fill in the next sentence on their own. That means a ton of thought and background went into every sentence.

Posted by: Stefan Forcey on December 5, 2020 4:31 AM | Permalink | Reply to this


Okay, so now I have a question!

I’m looking at the qq-logarithmic entropy and the qq-logarithmic information loss, specifically for q=2q=2. It looks like L 2(f)L_2(f) can be thought of as electrical resistance, and as such obeys Ohm’s law.

For instance, for parallel processes f,ff,f' let L 2(f)=aL_2(f) =a and L 2(f)=b.L_2(f') =b. Now we are given the convex linearity of L 2L_2, but I want to choose a specific λ\lambda for these parallel processes. I choose λ=1/a1/a+1/b=ba+b.\lambda = \frac{1/a}{1/a+1/b} = \frac{b}{a+b}. Now the axiom says that L 2L_2 of the weighted parallel processes is λ 2L 2(f)+(1λ) 2L 2(f).\lambda^2 L_2(f)+(1-\lambda)^2 L_2(f'). Using my choice of λ\lambda, that’s equal to aba+b.\frac{a b}{a+b}.

…which is exactly the formula for the effective resistance of parallel electrical resistors that obey Ohm’s law. We can describe the λ\lambda chosen above as the ratio of the current through ff to the total current through ff and ff', where voltage is 1. Therefore, for any given connected collection of morphisms, the information loss as measured by L 2L_2 could be modeled with wires in an electric circuit.

So my question is: is this a well-known fact, or a simple example of a bigger well-known theorem?

Posted by: Stefan Forcey on December 5, 2020 5:52 AM | Permalink | Reply to this

Re: Tsallis

Hi Stefan! Thanks for the very nice words :-)

I certainly hadn’t thought in terms of resistance. That’s an interesting idea. The bit in the book about 2-logarithmic information loss (a.k.a. difference of Tsallis entropies of order 2) comes from a paper with John Baez and Tobias Fritz, and I know John at least has thought hard about the theory of electrical circuits. John, any thoughts?

Posted by: Tom Leinster on December 5, 2020 5:08 PM | Permalink | Reply to this

RE: Tsallis

Thanks! I should mention that in reading near the beginning of your book I was fascinated by the parallel morphisms, and then searched for a specific pattern! I’ve been thinking a lot about the analogies between electric circuits and phylogenetics, and have been following the story of diversity measures. Here is a preprint for a recent publication with my student Drew Scalzo that reviews all the circuit ideas.

The next question would be: when reconstructing a phylogenetic tree or network, would it be useful to use the pairwise differences of Tsallis entropies of order 2 for the DNA of the extant (leaf) species? First we would have to define probability distributions for genomes. One way would be the distribution of the A,G,T,C: p ip_i is the relative frequency of base ii in the genome. But that wouldn’t tell me much…I want information loss to be caused by mutation, so that the more a gene mutates the more of its original information is lost. It’s all about aligning the bases, and so what we want to measure is the frequency of substrings of all lengths (or just some lengths, to approximate) found in genome X. Then if genome Y is very similar, it will have many of the same substrings.

Posted by: Stefan Forcey on December 5, 2020 8:19 PM | Permalink | Reply to this

Re: Tsallis

Congratulations on the book, Tom!

John, any thoughts?

Hmm. I don’t get what Stefan is doing, I just wanted to say I’m listening. Where is his λ\lambda coming from? Naively it looks like he’s chosen a fancy number to get something else to equal ab/(a+b)a b/(a+b).

If there’s something to this, it’s probably helpful to remember that ab/(a+b)a b/(a+b) is just addition conjugated by taking reciprocals:

11a+1b=aba+b \frac{1}{\frac{1}{a} + \frac{1}{b}} = \frac{a b}{a+b}

This operation is indeed important in electrical circuits, but does it show up in information and entropy? I don’t know.

Posted by: John Baez on December 6, 2020 4:44 PM | Permalink | Reply to this

Re: Tsallis

A short answer is that by describing this very special choice of λ\lambda I’m claiming that:

L 2L_2 is a generalization of ideal electrical resistance.

…where idealized electrical resistance (impedance) is an assignment of non-negative real values to morphisms which obeys the Ohm and Kirchoff laws.

But, to prove this claim I’d not only need the special fancy λ,\lambda, but also a way to show that electrical resistance is a special kind of information loss. Safer maybe would be this statement:

L 2L_2 and ideal electrical resistance are both specific examples of continuous 2-convex-linear functors.

There is more to say about the phylogenetic applications, but first I should stop and see if you think that at least one of my above statements makes sense!

Posted by: Stefan Forcey on December 6, 2020 8:07 PM | Permalink | Reply to this

RE: Tsallis

Nope, nope. Answering my own question here…

If electrical resistance is a functor that obeys the 2-convex-linearity equation for a special value of λ\lambda only, it does not deserve to be called 2-convex-linear.

What we have are a two different continuous functors, L 2L_2 and RR (resistance), both with the non-negative reals as codomain. L 2L_2 obeys a stronger condition than RR. The 2-convex-linearity condition implies the condition for Ohmic resistance, for parallel paths in the circuit. I’m not sure how to state this in a slogan!

A little more about why this might be nice to know: if you have a (multi)graph with resistance values assigned to its edges and a selection of vertices as the boundary nodes (terminals) then there are some well-known graph transformations that leave the measured pairwise resistances at the terminals invariant. One is the Y-Δ\Delta transformation shown in the linked preprint above, another is replacing multi-edges with their equivalent single edge.

Posted by: Stefan Forcey on December 6, 2020 10:39 PM | Permalink | Reply to this

Re: Entropy and Diversity: The Axiomatic Approach

This looks like a very nice book! I suspect I will be recommending it as supplemental reading at least the next time I teach statistical physics. (I recommended a lot of supplemental texts when I taught that this past spring, because the class wasn’t able to meet in person after March…) It calls to mind a good many “sweet sorrows” — ideas that I found quite engaging and worked with for a longish while but was never able to get into a satisfying shape. For example, I had the notion that in a similarity-weighted diversity index

D 2 Z(p):=( ijZ ijp ip j) 1, D_2^Z(p) := \left(\sum_{i j} Z_{i j} p_i p_j\right)^{-1} ,

the sum over ii and jj is the expected score in a game whose goal is agreement and the two players play randomly. This leads to the question of what happens when the game involves more than two players — what about contractions of the form Z ijkp ip jp kZ_{i j k} p_i p_j p_k? These arise in quantum information, where the Z ijkZ_{i j k} are the real parts of certain geometric phases, but we could also run into three-party similarity measures when comparing species. To take an artificial but pretty example, suppose {t 1,t 2,,t 7}\{t_1,t_2,\ldots,t_7\} are seven different phenotypic traits, that {s 1,s 2,,s 7}\{s_1,s_2,\ldots,s_7\} are seven species, and that traits and species correspond to points and lines in the Fano plane. Each species has three of the seven traits, every two species have one trait in common, and every trait is found in three species. All pairs are alike; we could say that Z ij=13Z_{i j} = \frac{1}{3} for all iji \neq j. But not all triads are alike, because a set of three lines in the Fano plane can either meet at a common point or not. Some sets of three species are more similar than others.

To move a little in the direction of biology, we could take a phylogenetic network. Let SS be a set of ancestral species and TT be a set of descendant species, with a directed graph GG providing paths from SS to TT. These paths might diverge if there is an evolutionary radiation, and they might converge if there is hybridization or horizontal gene transfer. This structure defines a matroid on TT, whose rank function r(U)r(U) for UTU \subset T is the size of the smallest set of vertices having the property that all paths from SS to UU must pass through it. (This kind of matroid is known as a gammoid.) The rank function rr is a kind of dissimilarity measure for the descendant species. A subset UTU \subset T is independent in the matroid-theoretic sense if there exists a set of vertex-disjoint paths from SS whose ending points are exactly UU; in biological language, this would say that the species in UU do not have a genetic common ancestor. (At least, to find one, you’d have to go back further into the past than SS.)

The funny thing is that rank functions of matroids behave a lot like Shannon information of sets of random variables. They always satisfy r(U)r(V)r(U) \leq r(V) for UVU \subseteq V, and they are submodular or strongly subadditive: for all U,VU, V,

r(UV)+r(UV)r(U)+r(V). r(U \cup V) + r(U \cap V) \leq r(U) + r(V).

So, we have an entropy-like quantity coming just out of the graph structure, even before we put a probability distribution on the species. I find that a bit odd!

This leads into the topic of higher-order mutual information defined by inclusion-exclusion, as you do in Remark 8.1.11, and how to make sense of it.

Posted by: Blake Stacey on December 9, 2020 8:21 PM | Permalink | Reply to this

Post a New Comment