### Entropy Modulo a Prime

#### Posted by Tom Leinster

In 1995, the German geometer Friedrich Hirzebruch retired, and a private booklet was put together to mark the occasion. That booklet included a short note by Maxim Kontsevich entitled “The $1\tfrac{1}{2}$-logarithm”.

Kontsevich’s note didn’t become publicly available until five years later, when it was included as an appendix to a paper on polylogarithms by Philippe Elbaz-Vincent and Herbert Gangl. Towards the end, it contains the following provocative words:

Conclusion:If we have a random variable $\xi$ which takes finitely many values with all probabilities in $\mathbb{Q}$ then we can define not only the transcendental number $H(\xi)$ but also its “residues modulo $p$” for almost all primes $p$ !

Kontsevich’s note was very short and omitted many details. I’ll put some flesh on those bones, showing how to make sense of the sentence above, and much more.

The “$H$” that Kontsevich uses here is the symbol for entropy — or
more exactly, Shannon entropy. So, I’ll begin by recalling what that is.
That will pave the way for what I *really* want to talk about, which is
a kind of entropy for probability distributions where the “probabilities”
are not real numbers, but elements of the field $\mathbb{Z}/p\mathbb{Z}$ of integers modulo a
prime $p$.

Let $\pi = (\pi_1, \ldots, \pi_n)$ be a finite probability distribution.
(It would be more usual to write a probability distribution as $p$, but I
want to reserve that letter for prime numbers.) The
**entropy** of $\pi$ is

$H_\mathbb{R}(\pi) = - \sum_{i : \pi_i \neq 0} \pi_i \log \pi_i.$

Usually this is just written as $H$, but I want to emphasize the role of the real numbers here: both the probabilities $\pi_i$ and the entropy $H_\mathbb{R}(\pi)$ belong to $\mathbb{R}$.

There are applications of entropy in dozens of branches of science… but none will be relevant here! This is purely a mathematical story, though if anyone can think of any possible application or interpretation of entropy modulo a prime, I’d love to hear it.

The challenge now is to find the correct analogue of entropy when the field $\mathbb{R}$ is replaced by the field $\mathbb{Z}/p\mathbb{Z}$ of integers mod $p$, for any prime $p$. So, we want to define a kind of entropy

$H_p(\pi_1, \ldots, \pi_n) \in \mathbb{Z}/p\mathbb{Z}$

when $\pi_i \in \mathbb{Z}/p\mathbb{Z}$.

We immediately run into an obstacle. Over $\mathbb{R}$, probabilities are
required to be nonnegative. Indeed, the logarithm in the definition of
entropy doesn’t make sense otherwise. But in $\mathbb{Z}/p\mathbb{Z}$,
there is no notion of positive or negative. So, what are we even going to
define the entropy *of*?

We take the simplest way out: ignore the problem. So, writing

$\Pi_n = \{ (\pi_1, \ldots, \pi_n) \in (\mathbb{Z}/p\mathbb{Z})^n : \pi_1 + \cdots + \pi_n = 1 \},$

we’re going to try to define

$H_p(\pi) \in \mathbb{Z}/p\mathbb{Z}$

for each $\pi = (\pi_1, \ldots, \pi_n) \in \Pi_n$.

Let’s try the most direct approach to doing this. That is, let’s stare at the formula defining real entropy…

$H_\mathbb{R}(\pi) = - \sum_{i : \pi_i \neq 0} \pi_i \log \pi_i$

… and try to write down the analogous formula over $\mathbb{Z}/p\mathbb{Z}$.

The immediate question is: what should play the role of the logarithm mod $p$?

The crucial property of the ordinary logarithm is that it converts multiplication into addition. Specifically, we’re concerned here with logarithms of nonzero probabilities, and $\log$ defines a homomorphism from the multiplicative group $(0, 1]$ of nonzero probabilities to the additive group $\mathbb{R}$.

Mod $p$, then, we want a homomorphism from the multiplicative group $(\mathbb{Z}/p\mathbb{Z})^\times$ of nonzero probabilities to the additive group $\mathbb{Z}/p\mathbb{Z}$. And here we hit another obstacle: a simple argument using Lagrange’s theorem shows that apart from the zero map, no such homomorphism exists.

So, we seem to be stuck. Actually, we’re stuck in a way that often happens when you try to construct something new, working by analogy with something old: slavishly imitating the old situation, symbol for symbol, often doesn’t work. In the most interesting analogies, there are wrinkles.

To make some progress, instead of looking at the *formula* for entropy,
let’s look at the *properties* of entropy.

The most important property is a kind of recursivity. In the language
spoken by many patrons of the Café, finite probability distributions form
an *operad*. Explicitly, this means the following.

Suppose I flip a coin. If it’s heads, I roll a die, and if it’s tails, I draw from a pack of cards. This is a two-stage process with 58 possible final outcomes: either the face of a die or a playing card. Assuming that the coin toss, die roll and card draw are all fair, the probability distribution on the 58 outcomes is

$(1/12, \ldots, 1/12, 1/104, \ldots, 1/104),$

with $6$ copies of $1/12$ and $52$ copies of $1/104$. Generally, given a probability distribution $\gamma = (\gamma_1, \ldots, \gamma_n)$ on $n$ elements and, for each $i \in \{1, \ldots, n\}$, a probability distribution $\pi^i = (\pi^i_1, \ldots, \pi^i_{k_i})$ on $k_i$ elements, we get a composite distribution

$\gamma \circ (\pi^1, \ldots, \pi^n) = (\gamma_1 \pi^1_1, \ldots, \gamma_1 \pi^1_{k_1}, \ldots, \gamma_n \pi^n_1, \ldots, \gamma_n \pi^n_{k_n})$

on $k_1 + \cdots + k_n$ elements.

For example, take the coin-die-card process above. Writing $u_n$ for the uniform distribution on $n$ elements, the final distribution on $58$ elements is $u_2 \circ (u_6, u_{52})$, which I wrote out explicitly above.

The important recursivity property of entropy is called the
**chain rule**, and it states that

$H_\mathbb{R}(\gamma \circ (\pi^1, \ldots, \pi^n)) = H_\mathbb{R}(\gamma) + \sum_{i = 1}^n \gamma_i H_\mathbb{R}(\pi^i).$

It’s easy to check that this is true. (It’s also nice to understand it in terms of information… but if I follow every tempting explanatory byway, I’ll run out of energy too soon.) And in fact, it characterizes entropy almost uniquely:

TheoremLet $I$ be a function assigning a real number $I(\pi)$ to each finite probability distribution $\pi$. The following are equivalent:

$I$ is continuous in $\pi$ and satisfies the chain rule;

$I = c H_\mathbb{R}$ for some constant $c \in \mathbb{R}$.

The theorem as stated is due to Faddeev, and I blogged about it earlier this year. In fact, you can weaken “continuous” to “measurable” (a theorem of Lee), but that refinement won’t be important here.

What *is* important is this. In our quest to imitate real entropy in
$\mathbb{Z}/p\mathbb{Z}$, we now have something to aim for. Namely: we
want a sequence of functions $H_p : \Pi_n \to \mathbb{Z}/p\mathbb{Z}$
satisfying the obvious analogue of the chain rule. And if we’re really
lucky, there will be essentially only *one* such sequence.

We’ll discover that this is indeed the case. Once we’ve found the right definition of $H_p$ and proved this, we can very legitimately baptize $H_p$ as “entropy mod $p$” — no matter what weird and wonderful formula might be used to define it — because it has the same characteristic properties as entropy over $\mathbb{R}$.

I think I’ll leave you on that cliff-hanger. If you’d like to guess what the definition of entropy mod $p$ is, go ahead! Otherwise, I’ll tell you next time.

## Re: Entropy Modulo a Prime

As a number theorist, it seems wrong to me that the analogues of R are the finite fields rather than the p adic fields or p adic integers.

An obvious advantage of working in the p adics is that logarithm is well defined automatically. Moreover, if we restrict our probabilities to take values in the p adic integers, then entropy is also automatically an integer and reduction makes sense.

I suppose the reduction of the p adic entropy equals whatever you have in mind over the finite fields?