## December 4, 2020

### Entropy and Diversity: The Axiomatic Approach

#### Posted by Tom Leinster

As Emily was kind enough to point out earlier, my new book is out on the arXiv!

It’s arXived by agreement with the wonderful Cambridge University Press, who will publish it in April 2021. You can pre-order it now, as the ideal festive gift for any friend who enjoys deferred gratification.

I’ve been writing about entropy and diversity on this blog for nearly ten years now. The book collects together and connects together many of the stories I’ve told.

One of the reasons I wanted to write a book is to show how all these stories fit together, which can be much harder to discern from scattered blog posts over a long period. So if you’ve seen my posts on entropy or diversity and not really known what to make of it all, perhaps the book will provide an answer.

Here’s the list of chapters, with links to some related Café posts (plus one from Azimuth) — often from back when my ideas were in a less developed form.

Finally, a puzzle: what’s shown on the cover of the book?

Posted at December 4, 2020 2:29 PM UTC

TrackBack URL for this Entry:   https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/3274

### Re: Entropy and Diversity: The Axiomatic Approach

Posted by: AB on December 4, 2020 3:38 PM | Permalink | Reply to this

### Re: Entropy and Diversity: The Axiomatic Approach

Yup!

As the linked Wikipedia article says, it’s a giant marine sinkhole off the coast of Belize. It was a favourite diving spot of Jacques Cousteau. Putting it on the cover was the excellent idea of CUP editor Roger Astley.

Here’s a photo from the inside:

This photo makes it look much smaller than it is. It’s actually over 300m in diameter.

Posted by: Tom Leinster on December 4, 2020 4:08 PM | Permalink | Reply to this

### Re: Entropy and Diversity: The Axiomatic Approach

Congrats Tom and sorry for preempting you with the other post. I just couldn’t contain my enthusiasm!

Posted by: Emily Riehl on December 4, 2020 4:17 PM | Permalink | Reply to this

### Re: Entropy and Diversity: The Axiomatic Approach

No no, it was really nice! Today’s a day of celebration for me, and your post made me feel more celebratory still.

Posted by: Tom Leinster on December 4, 2020 4:19 PM | Permalink | Reply to this

### Re: Entropy and Diversity: The Axiomatic Approach

Many congratulations, Tom. I had to teach the Yoneda lemma today and looked up your “Basic Category Theory”. The start of Chapter 4 made my day: “A category is a world of objects, all looking at one another. Each sees the world from a different viewpoint”.

I look forward to read equally inspiring explanations in the new book!

Posted by: Nicola Gambino on December 4, 2020 9:59 PM | Permalink | Reply to this

### Re: Entropy and Diversity: The Axiomatic Approach

Thank you very much, Nicola!

Posted by: Tom Leinster on December 5, 2020 4:53 PM | Permalink | Reply to this

### Entropy and Diversity

Beautiful book Tom! 400 plus pages, but it is so nicely organized and readable that I felt immediately drawn in rather than intimidated. It probably helps that I’ve followed many of the posts here about the papers that the book covers, but everything I’ve read so far just flows as if the reader could fill in the next sentence on their own. That means a ton of thought and background went into every sentence.

Posted by: Stefan Forcey on December 5, 2020 4:31 AM | Permalink | Reply to this

### Tsallis

Okay, so now I have a question!

I’m looking at the $q$-logarithmic entropy and the $q$-logarithmic information loss, specifically for $q=2$. It looks like $L_2(f)$ can be thought of as electrical resistance, and as such obeys Ohm’s law.

For instance, for parallel processes $f,f'$ let $L_2(f) =a$ and $L_2(f') =b.$ Now we are given the convex linearity of $L_2$, but I want to choose a specific $\lambda$ for these parallel processes. I choose $\lambda = \frac{1/a}{1/a+1/b} = \frac{b}{a+b}.$ Now the axiom says that $L_2$ of the weighted parallel processes is $\lambda^2 L_2(f)+(1-\lambda)^2 L_2(f').$ Using my choice of $\lambda$, that’s equal to $\frac{a b}{a+b}.$

…which is exactly the formula for the effective resistance of parallel electrical resistors that obey Ohm’s law. We can describe the $\lambda$ chosen above as the ratio of the current through $f$ to the total current through $f$ and $f'$, where voltage is 1. Therefore, for any given connected collection of morphisms, the information loss as measured by $L_2$ could be modeled with wires in an electric circuit.

So my question is: is this a well-known fact, or a simple example of a bigger well-known theorem?

Posted by: Stefan Forcey on December 5, 2020 5:52 AM | Permalink | Reply to this

### Re: Tsallis

Hi Stefan! Thanks for the very nice words :-)

I certainly hadn’t thought in terms of resistance. That’s an interesting idea. The bit in the book about 2-logarithmic information loss (a.k.a. difference of Tsallis entropies of order 2) comes from a paper with John Baez and Tobias Fritz, and I know John at least has thought hard about the theory of electrical circuits. John, any thoughts?

Posted by: Tom Leinster on December 5, 2020 5:08 PM | Permalink | Reply to this

### RE: Tsallis

Thanks! I should mention that in reading near the beginning of your book I was fascinated by the parallel morphisms, and then searched for a specific pattern! I’ve been thinking a lot about the analogies between electric circuits and phylogenetics, and have been following the story of diversity measures. Here is a preprint for a recent publication with my student Drew Scalzo that reviews all the circuit ideas.

The next question would be: when reconstructing a phylogenetic tree or network, would it be useful to use the pairwise differences of Tsallis entropies of order 2 for the DNA of the extant (leaf) species? First we would have to define probability distributions for genomes. One way would be the distribution of the A,G,T,C: $p_i$ is the relative frequency of base $i$ in the genome. But that wouldn’t tell me much…I want information loss to be caused by mutation, so that the more a gene mutates the more of its original information is lost. It’s all about aligning the bases, and so what we want to measure is the frequency of substrings of all lengths (or just some lengths, to approximate) found in genome X. Then if genome Y is very similar, it will have many of the same substrings.

Posted by: Stefan Forcey on December 5, 2020 8:19 PM | Permalink | Reply to this

### Re: Tsallis

Congratulations on the book, Tom!

John, any thoughts?

Hmm. I don’t get what Stefan is doing, I just wanted to say I’m listening. Where is his $\lambda$ coming from? Naively it looks like he’s chosen a fancy number to get something else to equal $a b/(a+b)$.

If there’s something to this, it’s probably helpful to remember that $a b/(a+b)$ is just addition conjugated by taking reciprocals:

$\frac{1}{\frac{1}{a} + \frac{1}{b}} = \frac{a b}{a+b}$

This operation is indeed important in electrical circuits, but does it show up in information and entropy? I don’t know.

Posted by: John Baez on December 6, 2020 4:44 PM | Permalink | Reply to this

### Re: Tsallis

A short answer is that by describing this very special choice of $\lambda$ I’m claiming that:

$L_2$ is a generalization of ideal electrical resistance.

…where idealized electrical resistance (impedance) is an assignment of non-negative real values to morphisms which obeys the Ohm and Kirchoff laws.

But, to prove this claim I’d not only need the special fancy $\lambda,$ but also a way to show that electrical resistance is a special kind of information loss. Safer maybe would be this statement:

$L_2$ and ideal electrical resistance are both specific examples of continuous 2-convex-linear functors.

There is more to say about the phylogenetic applications, but first I should stop and see if you think that at least one of my above statements makes sense!

Posted by: Stefan Forcey on December 6, 2020 8:07 PM | Permalink | Reply to this

### RE: Tsallis

Nope, nope. Answering my own question here…

If electrical resistance is a functor that obeys the 2-convex-linearity equation for a special value of $\lambda$ only, it does not deserve to be called 2-convex-linear.

What we have are a two different continuous functors, $L_2$ and $R$ (resistance), both with the non-negative reals as codomain. $L_2$ obeys a stronger condition than $R$. The 2-convex-linearity condition implies the condition for Ohmic resistance, for parallel paths in the circuit. I’m not sure how to state this in a slogan!

A little more about why this might be nice to know: if you have a (multi)graph with resistance values assigned to its edges and a selection of vertices as the boundary nodes (terminals) then there are some well-known graph transformations that leave the measured pairwise resistances at the terminals invariant. One is the Y-$\Delta$ transformation shown in the linked preprint above, another is replacing multi-edges with their equivalent single edge.

Posted by: Stefan Forcey on December 6, 2020 10:39 PM | Permalink | Reply to this

### Re: Entropy and Diversity: The Axiomatic Approach

This looks like a very nice book! I suspect I will be recommending it as supplemental reading at least the next time I teach statistical physics. (I recommended a lot of supplemental texts when I taught that this past spring, because the class wasn’t able to meet in person after March…) It calls to mind a good many “sweet sorrows” — ideas that I found quite engaging and worked with for a longish while but was never able to get into a satisfying shape. For example, I had the notion that in a similarity-weighted diversity index

$D_2^Z(p) := \left(\sum_{i j} Z_{i j} p_i p_j\right)^{-1} ,$

the sum over $i$ and $j$ is the expected score in a game whose goal is agreement and the two players play randomly. This leads to the question of what happens when the game involves more than two players — what about contractions of the form $Z_{i j k} p_i p_j p_k$? These arise in quantum information, where the $Z_{i j k}$ are the real parts of certain geometric phases, but we could also run into three-party similarity measures when comparing species. To take an artificial but pretty example, suppose $\{t_1,t_2,\ldots,t_7\}$ are seven different phenotypic traits, that $\{s_1,s_2,\ldots,s_7\}$ are seven species, and that traits and species correspond to points and lines in the Fano plane. Each species has three of the seven traits, every two species have one trait in common, and every trait is found in three species. All pairs are alike; we could say that $Z_{i j} = \frac{1}{3}$ for all $i \neq j$. But not all triads are alike, because a set of three lines in the Fano plane can either meet at a common point or not. Some sets of three species are more similar than others.

To move a little in the direction of biology, we could take a phylogenetic network. Let $S$ be a set of ancestral species and $T$ be a set of descendant species, with a directed graph $G$ providing paths from $S$ to $T$. These paths might diverge if there is an evolutionary radiation, and they might converge if there is hybridization or horizontal gene transfer. This structure defines a matroid on $T$, whose rank function $r(U)$ for $U \subset T$ is the size of the smallest set of vertices having the property that all paths from $S$ to $U$ must pass through it. (This kind of matroid is known as a gammoid.) The rank function $r$ is a kind of dissimilarity measure for the descendant species. A subset $U \subset T$ is independent in the matroid-theoretic sense if there exists a set of vertex-disjoint paths from $S$ whose ending points are exactly $U$; in biological language, this would say that the species in $U$ do not have a genetic common ancestor. (At least, to find one, you’d have to go back further into the past than $S$.)

The funny thing is that rank functions of matroids behave a lot like Shannon information of sets of random variables. They always satisfy $r(U) \leq r(V)$ for $U \subseteq V$, and they are submodular or strongly subadditive: for all $U, V$,

$r(U \cup V) + r(U \cap V) \leq r(U) + r(V).$

So, we have an entropy-like quantity coming just out of the graph structure, even before we put a probability distribution on the species. I find that a bit odd!

This leads into the topic of higher-order mutual information defined by inclusion-exclusion, as you do in Remark 8.1.11, and how to make sense of it.

Posted by: Blake Stacey on December 9, 2020 8:21 PM | Permalink | Reply to this

Post a New Comment