Skip to the Main Content

Note:These pages make extensive use of the latest XHTML and CSS Standards. They ought to look great in any standards-compliant modern browser. Unfortunately, they will probably look horrible in older browsers, like Netscape 4.x and IE 4.x. Moreover, many posts use MathML, which is, currently only supported in Mozilla. My best suggestion (and you will thank me when surfing an ever-increasing number of sites on the web which have been crafted to use the new standards) is to upgrade to the latest version of your browser. If that's not possible, consider moving to the Standards-compliant and open-source Mozilla browser.

April 14, 2017


Posted by Tom Leinster

What is the value of the whole in terms of the values of the parts?

More specifically, given a finite set whose elements have assigned “values” v 1,,v nv_1, \ldots, v_n and assigned “sizes” p 1,,p np_1, \ldots, p_n (normalized to sum to 11), how can we assign a value σ(p,v)\sigma(\mathbf{p}, \mathbf{v}) to the set in a coherent way?

This seems like a very general question. But in fact, just a few sensible requirements on the function σ\sigma are enough to pin it down almost uniquely. And the answer turns out to be closely connected to existing mathematical concepts that you probably already know.

Let’s write

Δ n={(p 1,,p n) n:p i0,p i=1} \Delta_n = \Bigl\{ (p_1, \ldots, p_n) \in \mathbb{R}^n : p_i \geq 0, \sum p_i = 1 \Bigr\}

for the set of probability distributions on {1,,n}\{1, \ldots, n\}. Assuming that our “values” are positive real numbers, we’re interested in sequences of functions

(σ:Δ n×(0,) n(0,)) n1 \Bigl( \sigma \colon \Delta_n \times (0, \infty)^n \to (0, \infty) \Bigr)_{n \geq 1}

that aggregate the values of the elements to give a value to the whole set. So, if the elements of the set have relative sizes p=(p 1,,p n)\mathbf{p} = (p_1, \ldots, p_n) and values v=(v 1,,v n)\mathbf{v} = (v_1, \ldots, v_n), then the value assigned to the whole set is σ(p,v)\sigma(\mathbf{p}, \mathbf{v}).

Here are some properties that it would be reasonable for σ\sigma to satisfy.

Homogeneity  The idea is that whatever “value” means, the value of the set and the value of the elements should be measured in the same units. For instance, if the elements are valued in kilograms then the set should be valued in kilograms too. A switch from kilograms to grams would then multiply both values by 1000. So, in general, we ask that

σ(p,cv)=cσ(p,v) \sigma(\mathbf{p}, c\mathbf{v}) = c \sigma(\mathbf{p}, \mathbf{v})

for all pΔ n\mathbf{p} \in \Delta_n, v(0,) n\mathbf{v} \in (0, \infty)^n and c(0,)c \in (0, \infty).

Monotonicity  The values of the elements are supposed to make a positive contribution to the value of the whole, so we ask that if v iv iv_i \leq v'_i for all ii then

σ(p,v)σ(p,v) \sigma(\mathbf{p}, \mathbf{v}) \leq \sigma(\mathbf{p}, \mathbf{v}')

for all pΔ n\mathbf{p} \in \Delta_n.

Replication  Suppose that our nn elements have the same size and the same value, vv. Then the value of the whole set should be nvn v. This property says, among other things, that σ\sigma isn’t an average: putting in more elements of value vv increases the value of the whole set!

If σ\sigma is homogeneous, we might as well assume that v=1v = 1, in which case the requirement is that

σ((1/n,,1/n),(1,,1))=n. \sigma\bigl( (1/n, \ldots, 1/n), (1, \ldots, 1) \bigr) = n.

Modularity  This one’s a basic logical axiom, best illustrated by an example.

Imagine that we’re very ambitious and wish to evaluate the entire planet — or at least, the part that’s land. And suppose we already know the values and relative sizes of every country.

We could, of course, simply put this data into σ\sigma and get an answer immediately. But we could instead begin by evaluating each continent, and then compute the value of the planet using the values and sizes of the continents. If σ\sigma is sensible, this should give the same answer.

The notation needed to express this formally is a bit heavy. Let wΔ n\mathbf{w} \in \Delta_n; in our example, n=7n = 7 (or however many continents there are) and w=(w 1,,w 7)\mathbf{w} = (w_1, \ldots, w_7) encodes their relative sizes. For each i=1,,ni = 1, \ldots, n, let p iΔ k i\mathbf{p}^i \in \Delta_{k_i}; in our example, p i\mathbf{p}^i encodes the relative sizes of the countries on the iith continent. Then we get a probability distribution

w(p 1,,p n)=(w 1p 1 1,,w 1p k 1 1,,w np 1 n,,w np k n n)Δ k 1++k n, \mathbf{w} \circ (\mathbf{p}^1, \ldots, \mathbf{p}^n) = (w_1 p^1_1, \ldots, w_1 p^1_{k_1}, \,\,\ldots, \,\, w_n p^n_1, \ldots, w_n p^n_{k_n}) \in \Delta_{k_1 + \cdots + k_n},

which in our example encodes the relative sizes of all the countries on the planet. (Incidentally, this composition makes (Δ n)(\Delta_n) into an operad, a fact that we’ve discussed many times before on this blog.) Also let

v 1=(v 1 1,,v k 1 1)(0,) k 1,,v n=(v 1 n,,v k n n)(0,) k n. \mathbf{v}^1 = (v^1_1, \ldots, v^1_{k_1}) \in (0, \infty)^{k_1}, \,\,\ldots,\,\, \mathbf{v}^n = (v^n_1, \ldots, v^n_{k_n}) \in (0, \infty)^{k_n}.

In the example, v j iv^i_j is the value of the jjth country on the iith continent. Then the value of the iith continent is σ(p i,v i)\sigma(\mathbf{p}^i, \mathbf{v}^i), so the axiom is that

σ(w(p 1,,p n),(v 1 1,,v k 1 1,,v 1 n,,v k n n))=σ(w,(σ(p 1,v 1),,σ(p n,v n))). \sigma \bigl( \mathbf{w} \circ (\mathbf{p}^1, \ldots, \mathbf{p}^n), (v^1_1, \ldots, v^1_{k_1}, \ldots, v^n_1, \ldots, v^n_{k_n}) \bigr) = \sigma \Bigl( \mathbf{w}, \bigl( \sigma(\mathbf{p}^1, \mathbf{v}^1), \ldots, \sigma(\mathbf{p}^n, \mathbf{v}^n) \bigr) \Bigr).

The left-hand side is the value of the planet calculated in a single step, and the right-hand side is its value when calculated in two steps, with continents as the intermediate stage.

Symmetry  It shouldn’t matter what order we list the elements in. So it’s natural to ask that

σ(p,v)=σ(pτ,vτ) \sigma(\mathbf{p}, \mathbf{v}) = \sigma(\mathbf{p} \tau, \mathbf{v} \tau)

for any τ\tau in the symmetric group S nS_n, where the right-hand side refers to the obvious S nS_n-actions.

Absent elements should count for nothing! In other words, if p 1=0p_1 = 0 then we should have

σ((p 1,,p n),(v 1,,v n))=σ((p 2,,p n),(v 2,,v n)). \sigma\bigl( (p_1, \ldots, p_n), (v_1, \ldots, v_n)\bigr) = \sigma\bigl( (p_2, \ldots, p_n), (v_2, \ldots, v_n)\bigr).

This isn’t quite triival. I haven’t yet given you any examples of the kind of function that σ\sigma might be, but perhaps you already have in mind a simple one like this:

σ(p,v)=v 1++v n. \sigma(\mathbf{p}, \mathbf{v}) = v_1 + \cdots + v_n.

In words, the value of the whole is simply the sum of the values of the parts, regardless of their sizes. But if σ\sigma is to have the “absent elements” property, this won’t do. (Intuitively, if p i=0p_i = 0 then we shouldn’t count v iv_i in the sum, because the iith element isn’t actually there.) So we’d better modify this example slightly, instead taking

σ(p,v)= i:p i>0v i. \sigma(\mathbf{p}, \mathbf{v}) = \sum_{i \,:\, p_i \gt 0} v_i.

This function (or rather, sequence of functions) does have the “absent elements” property.

Continuity in positive probabilities  Finally, we ask that for each v(0,) n\mathbf{v} \in (0, \infty)^n, the function σ(,v)\sigma(-, \mathbf{v}) is continuous on the interior of the simplex Δ n\Delta_n, that is, continuous over those probability distributions p\mathbf{p} such that p 1,,p n>0p_1, \ldots, p_n \gt 0.

Why only over the interior of the simplex? Basically because of natural examples of σ\sigma like the one just given, which is continuous on the interior of the simplex but not the boundary. Generally, it’s sometimes useful to make a sharp, discontinuous distinction between the cases p i>0p_i \gt 0 (presence) and p i=0p_i = 0 (absence).


Arrow’s famous theorem states that a few apparently mild conditions on a voting system are, in fact, mutually contradictory. The mild conditions above are not mutually contradictory. In fact, there’s a one-parameter family σ q\sigma_q of functions each of which satisfies these conditions. For real q1q \neq 1, the definition is

σ q(p,v)=( i:p i>0p i qv i 1q) 1/(1q). \sigma_q(\mathbf{p}, \mathbf{v}) = \Bigl( \sum_{i \,:\, p_i \gt 0} p_i^q v_i^{1 - q} \Bigr)^{1/(1 - q)}.

For instance, σ 0\sigma_0 is the example of σ\sigma given above.

The formula for σ q\sigma_q is obviously invalid at q=1q = 1, but it converges to a limit as q1q \to 1, and we define σ 1(p,v)\sigma_1(\mathbf{p}, \mathbf{v}) to be that limit. Explicitly, this gives

σ 1(p,v)= i:p i>0(v i/p i) p i. \sigma_1(\mathbf{p}, \mathbf{v}) = \prod_{i \,:\, p_i \gt 0} (v_i/p_i)^{p_i}.

In the same way, we can define σ \sigma_{-\infty} and σ \sigma_\infty as the appropriate limits:

σ (p,v)=max i:p i>0v i/p i,σ (p,v)=min i:p i>0v i/p i. \sigma_{-\infty}(\mathbf{p}, \mathbf{v}) = \max_{i \,:\, p_i \gt 0} v_i/p_i, \qquad \sigma_{\infty}(\mathbf{p}, \mathbf{v}) = \min_{i \,:\, p_i \gt 0} v_i/p_i.

And it’s easy to check that for each q[,]q \in [-\infty, \infty], the function σ q\sigma_q satisfies all the natural conditions listed above.

These functions σ q\sigma_q might be unfamiliar to you, but they have some special cases that are quite well-explored. In particular:

  • Suppose you’re in a situation where the elements don’t have “sizes”. Then it would be natural to take p\mathbf{p} to be the uniform distribution u n=(1/n,,1/n)\mathbf{u}_n = (1/n, \ldots, 1/n). In that case, σ q(u n,v)=const(v i 1q) 1/(1q), \sigma_q(\mathbf{u}_n, \mathbf{v}) = const \cdot \bigl( \sum v_i^{1 - q} \bigr)^{1/(1 - q)}, where the constant is a certain power of nn. When q0q \leq 0, this is exactly a constant times v 1q\|\mathbf{v}\|_{1 - q}, the (1q)(1 - q)-norm of the vector v\mathbf{v}.

  • Suppose you’re in a situation where the elements don’t have “values”. Then it would be natural to take v\mathbf{v} to be 1=(1,,1)\mathbf{1} = (1, \ldots, 1). In that case, σ q(p,1)=(p i q) 1/(1q). \sigma_q(\mathbf{p}, \mathbf{1}) = \bigl( \sum p_i^q \bigr)^{1/(1 - q)}. This is the quantity that ecologists know as the Hill number of order qq and use as a measure of biological diversity. Information theorists know it as the exponential of the Rényi entropy of order qq, the special case q=1q = 1 being Shannon entropy. And actually, the general formula for σ q\sigma_q is very closely related to Rényi relative entropy (which Wikipedia calls Rényi divergence).

Anyway, the big — and as far as I know, new — result is:

Theorem  The functions σ q\sigma_q are the only functions σ\sigma with the seven properties above.

So although the properties above don’t seem that demanding, they actually force our notion of “aggregate value” to be given by one of the functions in the family (σ q) q[,](\sigma_q)_{q \in [-\infty, \infty]}. And although I didn’t even mention the notions of diversity or entropy in my justification of the axioms, they come out anyway as special cases.

I covered all this yesterday in the tenth and penultimate installment of the functional equations course that I’m giving. It’s written up on pages 38–42 of the notes so far. There you can also read how this relates to more realistic measures of biodiversity than the Hill numbers. Plus, you can see an outline of the (quite substantial) proof of the theorem above.

Posted at April 14, 2017 4:17 PM UTC

TrackBack URL for this Entry:

9 Comments & 0 Trackbacks

Re: Value

This looks fascinating! But how am I supposed to think of such a notion of value? Concretely, what does it mean that each element of the set has an assigned probability p ip_i? The obvious interpretation is that the element ii is actually contained in the set only with probability p ip_i, and that all these probabilities are independent. This would suggest that taking the expectation value ip iv i\sum_i p_i v_i should be a reasonable notion of value, but this contradicts the replication property (as you note explicitly).

Perhaps the answer is that you want diversity to be a value in itself? Do you have in mind a way to motivate this without talking about diversity measures?

Oh, and there’s a small typo in the definition of σ 1\sigma_1, where the subscript ii should be in the exponent.

evaluate the entire planet

A beautiful pun!

Posted by: Tobias Fritz on April 14, 2017 5:09 PM | Permalink | Reply to this

Re: Value


To be honest, I’m not sure how you’re supposed to think about value in general. I’d like some help developing my intuition about it.

One point I find helpful is Remark 5.18 of the notes: that if you regard each of the “parts” or “elements” ii as made up of a certain number of individuals, then σ q(p,v)\sigma_q(\mathbf{p}, \mathbf{v}) can be understood as the number of individuals times the average value per individual.

I’d tend to think of p ip_i as proportions. A typical example (Example 5.16(iv) in the notes) is this. We have an ecological community of some kind, divided into nn subcommunities. These might be geographical sites, e.g. west of the river and east of the river. Then p ip_i denotes the size of the iith subcommunity, normalized so that p i=1\sum p_i = 1. And v iv_i denotes the diversity of the iith subcommunity. My general question then becomes: what is the value of the whole community in terms of the values of the subcommunities? That, as you more or less guessed, was how I started on this line of thought.

But I’d love to have better intuition. It would be really helpful to have some examples that have nothing to do with diversity. If you can think of any, please pass them on!

Posted by: Tom Leinster on April 14, 2017 5:32 PM | Permalink | Reply to this

Re: Value

My intuition also has a hard time reconciling the replication property with the normalization of the p ip_i’s. Normalizing the “total size” seems to say that we don’t care about the “total amount” but about some kind of weighted average, but replication says that’s not it at all.

Here’s another way to describe the formula for σ p\sigma_p that is probably in your notes (or at least in the omitted proof): take the q thq^{\mathrm{th}} power of all the sizes, then use them as the weights in the formula for the weighted (1-q)-power mean. (Of course it isn’t exactly a power mean any more, since the q thq^{\mathrm{th}} powers of sizes will no longer sum to 1.)

Posted by: Mike Shulman on April 15, 2017 12:37 PM | Permalink | Reply to this

Re: Value

There’s another way to see the formula for σ q\sigma_q as a power mean:

σ q(p,v)=(p i(v ip i) 1q) 1/(1q)=M 1q(p,v/p) \sigma_q(\mathbf{p}, \mathbf{v}) = \Bigl( \sum p_i \Bigl(\frac{v_i}{p_i}\Bigr)^{1 - q} \Bigr)^{1/(1 - q)} = M_{1 - q}(\mathbf{p}, \mathbf{v}/\mathbf{p})

where the quotient v/p\mathbf{v}/\mathbf{p} is defined coordinatewise and M 1qM_{1 - q} is the power mean of order 1q1 - q.

Suppose each “element” ii of our set is made up of a collection of k ik_i individuals, and write k= ik ik = \sum_i k_i. Then it’s reasonable to take p i=k i/kp_i = k_i/k, and by the formula above,

σ q(p,v)=kM 1q(p,(v 1k 1,,v nk n)). \sigma_q( \mathbf{p}, \mathbf{v} ) = k \cdot M_{1 - q}\Bigl(\mathbf{p}, \Bigl(\frac{v_1}{k_1}, \ldots, \frac{v_n}{k_n}\Bigr)\Bigr).

This invites us to imagine taking the value of the iith element and sharing it evenly among the k ik_i individuals that make up that element. Then each individual has a value of v i/k iv_i/k_i, and the last displayed equation says that the value of the whole is the number of individuals times the average value of each individual.

It may be that “value” is not the best word. I’m open to suggestions!

Posted by: Tom Leinster on April 15, 2017 2:32 PM | Permalink | Reply to this

Re: Value

You mentioned this in passing:

Incidentally, this composition makes (Δ n)(\Delta_n) into an operad, a fact that we’ve discussed many times before on this blog.

but you didn’t say what all the properties of σ\sigma mean in operadic language. I expect you noticed that most of them are quite natural:

  • Modularity means that σ\sigma makes (0,)(0,\infty) into an algebra over the operad (Δ n)(\Delta_n). (Actually, it doesn’t include the unit condition explicitly, but that follows from Homogeneity and Replication.)

  • Symmetry means that this is an algebra for the symmetric operad (Δ n)(\Delta_n).

  • Absent Elements means that this is actually an algebra for the semicartesian operad (Δ n)(\Delta_n).

  • Homogeneity means that this is an algebra in the category of (0,)(0,\infty)-sets, where (0,)(0,\infty) is a multiplicative monoid.

  • Monotonicity means that it is an algebra in the category of posets.

  • Continuity In Positive Probabilities means that if we restrict it to an action of the operad (Δ n )(\Delta^\circ_n) of interiors of simplices then it is an algebra in the category of topological spaces. Absent Elements means that the action of all of (Δ n)(\Delta_n) is determined by its action on (Δ n )(\Delta^\circ_n); in fact I suspect that (Δ n)(\Delta_n) is the free semicartesian operad on the symmetric operad (Δ n )(\Delta^\circ_n).

So all together these properties say that we are making (0,)(0,\infty) into an algebra for (Δ n )(\Delta^\circ_n) in the category of topological ordered (0,)(0,\infty)-sets, where (Δ n )(\Delta^\circ_n) is a topological operad (without order or (0,)(0,\infty)-action).

I can’t think of a nice operadic way to state Replication, though, which probably has something to do with its weirdness to me. I notice that Replication for a fixed value of nn is just a normalization condition; its nontriviality has to do with how the normalization of the actions at different values of nn are related.

Posted by: Mike Shulman on April 15, 2017 11:08 PM | Permalink | Reply to this

Re: Value

Thanks, Mike; that’s nice! The post above is very close to stuff I said at the functional equations course, where the audience is very mixed (a wide variety of different kind of mathematician, plus a few biologists and physicists). That mixedness is really great, but prevents me from saying sleek categorical things like you just did.

I’ve been trying to think of how to state the replication principle in a nice categorical way, but without success. At one stage I thought the key might be to extend the (Δ n)(\Delta_n)-algebra structure σ q\sigma_q on (0,)(0, \infty) to a (0,) (0, \infty)^\bullet-algebra structure, where by (0,) (0, \infty)^\bullet I mean the operad whose set of nn-ary operations is (0,) n(0, \infty)^n and which contains (Δ n)(\Delta_n) as a suboperad. But I haven’t figured that out.

There’s a similar, related, challenge that I’ve come across before. The Shannon entropy of a finite probability distribution p\mathbf{p},

H(p)= ip ilogp i, H(\mathbf{p}) = - \sum_i p_i \log p_i,

depends on the choice of base for the logarithm, and changing the base multiplies HH by a constant factor. Several axiomatic characterizations of Shannon entropy only characterize it up to a constant factor, or else include an axiom specifically to eliminate that factor. For instance,

H(1/n,1/n,,1/n)=logn, H(1/n, 1/n, \ldots, 1/n) = \log n,

so including the axiom “H(1/2,1/2)=1H(1/2, 1/2) = 1” forces the base of the logarithm to be 22.

One could choose to work not with entropy but with its “exponential”. Let me temporarily write H b(p)H_b(\mathbf{p}) for the entropy defined using logarithms to base bb. Then by the “exponential of entropy”, I mean

D(p)=b H b(p)=1/p 1 p 1p n p n. D(\mathbf{p}) = b^{H_b(\mathbf{p})} = 1/p_1^{p_1} \cdots p_n^{p_n}.

This is independent of the choice of bb! So whereas H(p)H(\mathbf{p}) is only well-defined up to a constant factor (if we allow bb to vary), D(p)D(\mathbf{p}) is really a unique, canonical thing. That’s one benefit of using DD rather than HH. Another, related, benefit is the formula

D(1/n,1/n,,1/n)=n, D(1/n, 1/n, \ldots, 1/n) = n,

which again doesn’t contain a logarithm. This last equation is often called the “effective number” property, and is very close to the property I called “replication” in my post.

Now, obviously any axiomatic characterization of Shannon entropy HH can be translated into an axiomatic characterization of DD. Since many of these characterizations only pin HH down up to a constant factor, they only pin DD down up to a constant power. However, there’s something special about DD itself as opposed to D cD^c for any power c1c \neq 1, since getting D(1/n,,1/n)D(1/n, \ldots, 1/n) to come out as nn rather than n cn^c is special. So we might find ourselves putting in the condition

D(1/n,,1/n)=n D(1/n, \ldots, 1/n) = n

by hand.

This isn’t very satisfactory; it would be better if it could be understood as something categorically natural. But I don’t know how to do this. For example, back here I described a categorical characterization of HH: Shannon entropy and its scalar multiples are exactly the internal P\mathbf{P}-algebras in the categorical P\mathbf{P}-algebra +\mathbb{R}_+. (Here P\mathbf{P} is the operad (Δ n)(\Delta_n), and all the terminology is defined in that post.) You can easily translate that into a categorical characterization of DD and its powers. But I don’t know a natural categorical way to put my finger on DD itself.

Posted by: Tom Leinster on April 17, 2017 11:21 AM | Permalink | Reply to this

Re: Value

Is there any particular reason you object to the scaling freedom in characterizations of entropy? Lots of perfectly respectable characterization theorems only work up to scaling: For each nn-dimensional vector space VV, there is a unique alternating nn-linear form on V nV^n, up to a constant factor. There is a unique translation-invariant σ\sigma-finite measure on n\mathbb{R}^n, up to a constant factor. There are standard choices of normalization in those cases, but I’m not sure if they’re exactly categorically natural.

Posted by: Mark Meckes on April 17, 2017 6:55 PM | Permalink | Reply to this

Re: Value

Is there any particular reason you object to the scaling freedom in characterizations of entropy?

Sorry, I probably expressed myself badly. I don’t object to the scaling freedom in characterizations of entropy. What I object to — or more accurately, would like to find a way to bypass — is the freedom in characterizations of diversity DD (that is, the exponential of entropy).

The fundamental difference between D(p)= ip i p i D(\mathbf{p}) = \prod_i p_i^{-p_i} and its logarithm H(p)= ip ilogp i H(\mathbf{p}) = - \sum_i p_i \log p_i is that DD does not depend on a choice of base, but HH does. So while it seems reasonable to me that characterizations of HH leave the base unspecified (and therefore contain one degree of freedom), it seems like more of a shortcoming when characterizations of DD contain a corresponding degree of freedom.

So, I want to find a categorical viewpoint from which the “effective number” condition D(1/n,,1/n)=n D(1/n, \ldots, 1/n) = n is natural.

Posted by: Tom Leinster on April 17, 2017 7:11 PM | Permalink | Reply to this

Re: Value

Ah, I wasn’t reading closely enough. I agree, it would be very nice to find a categorical viewpoint that makes the effective number condition natural.

Posted by: Mark Meckes on April 17, 2017 7:28 PM | Permalink | Reply to this

Post a New Comment