Means
Posted by Tom Leinster
I’ve been interested for a while in abstract notions of size, and this has got me thinking about abstract notions of measure—hence, inescapably, of integration. Now integration can be thought of as an averaging process: for instance, the mean value of a function is simply . So means provide another way to think about measure and integral (and probability, indeed).
For most types of integration there are restrictions on the functions involved: you can only integrate an integrable function. But I’m going to describe three slightly unusual settings in which the challenge is to find a notion of integration, or mean, in which every function is allowed. Here they are:
- The mean of binary digits
- Amenable groups
- Arrow’s Theorem on voting systems.
Then there’s a postscript on voting and enriched categories.
The mean of binary digits
Let’s start with something really minimal and structureless. Fix a set . Is there any meaningful way of assigning to each function a ‘mean value’ ?
It depends what mean means. Certainly the mean of a constant function should be that constant, so the first axiom is:
a. if has constant value then (where )
Means are usually linear, so we now want to say something like ‘’. We can’t necessarily add together binary digits (since ), but it’s OK if and have disjoint supports. So the second and final axiom is:
b. if and there is no for which then .
A mean on is a function satisfying these two axioms.
There are lots of equivalent formulations of this concept.
For instance, is a Boolean algebra, so is too, and you can equivalently replace b with the superficially stronger axiom that preserves the whole Boolean algebra structure: where and .
Or, you can switch from integration to measure. It’s perhaps unfortunate that the letter is commonly used both for mean and for measure, but I’ll take advantage of this coincidence. For subsets of , I’ll write to mean , where is the characteristic function of . Writing for powerset, then becomes a map . Axiom a then says that and , and axiom b can be stated as the inclusion-exclusion principle: (). Such a is also called a valuation.
Or, since the mean value is restricted to lie in , you might as well just record those subsets for which . Let’s write and say that a subset is large if , and small if . Axioms a and b are then equivalent to:
- every subset of is either large or small, but not both
- every subset of a small set is small
- a finite union of small sets is small.
A set of subsets of with these three properties is called an ultrafilter on . So, means are the same as ultrafilters.
In all three of the situations that I’ll describe, the basic question is this:
Do nontrivial means exist?
In the current situation, ‘nontrivial’ means the following. There is, for each , a trivial mean given by If you’re thinking of as a function , this is evaluation at . If you’re thinking of as an element of the product set , it’s projection onto the th factor. Regarding as a measure, it’s the point measure, or Dirac delta, at . Regarding as an ultrafilter, it’s the principal ultrafilter at : a subset is large if and only if it contains .
It’s an easy exercise to show that when is finite, these are the only means. When is infinite, the situation is more interesting. There are nontrivial means, but you can’t construct any: you have to invoke some form of the Axiom of Choice. (I’ll assume Choice throughout.) So the answer to the question is:
No if is finite; yes if is infinite.
Amenable groups
Now let’s try the same thing, but giving the structure of a group. The mean should respect the group structure in the following sense. Given , write for left translation by . Then we require
c. , for all and .
Thinking in terms of measure, this says that the measure of a subset is the same as the measure of each of its translates.
But there’s an instant problem. For example, taking , we have —the even and odd numbers have the same measure. But the disjoint union of and is , which has measure . So . Similarly, for any positive integer , and if is a finite group of order then for all . All of these things are impossible if mean values are restricted to lie in .
So, we make the question more interesting by allowing to take values in the real interval . Precisely: a mean on a group is a function satisfying axioms a, b and c. A group is amenable if it admits at least one mean. (There are no ‘trivial’ means to be excluded.)
The word ‘amenable’ is kind of a pun. In ordinary English the word means ‘willing’ or ‘easy to work with’. But said out loud (by me, at least) it sounds like ‘a-mean-able’.
(You might think that there should be separate definitions of left amenable and right amenable, because of the asymmetry in the definition above. But the two are equivalent. Any group is isomorphic to its opposite , the same set with the multiplication reversed. Hence is left amenable iff is left amenable. On the other hand, is left amenable iff is right amenable.)
The theory of amenable groups has been intensively studied. There are a zillion equivalent ways of stating the definition, and there are close and fascinating links to all sorts of topics in algebra, topology and analysis. Here I’ll just say a few simple things.
Every finite group is amenable. In fact, there is exactly one mean on a finite group: it’s the mean in the obvious sense.
Every abelian group is amenable. This is not at all obvious, as far as I know. If you haven’t seen this before, I encourage you to stop and think for a few minutes about how you might prove that is amenable: how would you define a mean, or a measure?
…
From the arguments above, the measure needs to be set up so that any coset of has measure . What about, say, the set of all primes in ? You’d imagine that it would probably have measure , since the primes get sparser and sparser, but that’s not so clear.
The standard way to ‘construct’ a measure, which I won’t describe, uses the concept of Følner sequence. It’s a non-constructive definition. Among other things, it involves choosing a nonprincipal ultrafilter on the set —in other words, a nontrivial mean on , in the sense of the first part above.
That’s just . Showing that every abelian group is amenable usually seems to be proved by bootstrapping your way up from to finitely generated abelian groups, and from there to all abelian groups. At one point I tried to discover whether the proof could be streamlined, and made a bit of progress, thanks to nice people at MathOverflow.
The free group on two or more elements is not amenable. It’s a pleasant exercise to show this.
There are some groups whose status is famously unknown. This is so for Thompson’s group . Two papers on this question were arXived last year: one claiming to prove that is amenable, and the other claiming to prove that it is not. Danny Calegari had an excellent blog post on this affair. We also discussed it a bit here at the Café.
So in this second situation, the existence of means is a very substantial issue.
Arrow’s Theorem
So far we’ve discussed means of numbers. Now we’re going to discuss means of structures. Given a set and a bunch of group structures on , what is the mean group structure? Given a bunch of topologies on , what is the mean topology?
This might sound like madness. But there are some structures for which it is a reasonable question. Arrow’s Theorem concerns the case of total orders.
Here’s a standard way of narrating Arrow’s Theorem. In an election, there are a finite set of voters and a finite set of candidates. Each voter chooses a total order on . The outcome of the election is also to be a total order on . Is there any reasonable system for deciding the outcome on the basis of the votes? Arrow’s Theorem says ‘no’. And it’s a very strong ‘no’, in that the interpretation of the word ‘reasonable’ is dramatically generous.
When we come to define ‘reasonable’, there will be very few concessions to equality, either between voters or between candidates. Perhaps men’s votes count for more than women’s, or the government has used its power to rig the system in its favour. But it needn’t be anything reprehensible. Perhaps the voters are the members of a committee and the chair has a casting vote, so that not all voters are equal. Or perhaps the vote is on various options for constitutional change, and the system gives preference to the status quo.
You can find plenty of expositions of Arrow’s Theorem, but what follows is one that I haven’t seen before, using categorical language.
Fix finite sets (the voters) and (the candidates). Given a set , write for the set of total orders on . A mean or voting system for and is a function satisfying axioms i and ii below.
It will be convenient to write , so that an element of may be written as where each is a total order on the set of candidates.
Total orders have the following key property: when you choose a total order on , you also implicitly choose a total order on each subset of . So for any subset , there is a restriction map . This makes into a functor where is the powerset of , ordered by inclusion.
First axiom:
i. extends to a natural transformation
In other words, there is a natural transformation whose -component is . It’s not hard to show that there can be at most one such extension. (Use the fact that any total order on a subset of can be extended to a total order on itself.)
But what does this mean? In its traditional formulation, this axiom rejoices in the name of the Independence of Irrelevant Alternatives. (It’s a wonderful name, though in my opinion it’s hard to beat the splendour of Sylvester’s Law of Inertia.) Suppose there is an election with three candidates: . For some reason the election has to be re-run, with the same voters and the same candidates. In between the two elections, candidates and do nothing interesting, but candidate makes outrageous statements. The effect is that in the second election, no voter changes their opinion on whether is better than , but some voters change their opinion about . Independence of Irrelevant Alternatives says that in the outcomes, and are placed in the same order both times. (Candidate is the ‘irrelevant alternative’.)
Briefly put: the axiom says that the voting system is compatible with restricting to subsets of the candidates. This is why it has something to do with naturality.
There is a canonical natural transformation —the diagonal. Its component at sends to .
Second axiom: if everyone puts the candidates in the same order, then the outcome of the election also puts the candidates in that order. That is:
ii. .
This is not the axiomatization of voting systems to be found in most statements of Arrow’s Theorem, but unless I’m mistaken, it’s equivalent.
There are trivial means or voting systems: dictatorships. For any , there is a mean given by The corresponding natural transformation is the th projection, which is obviously a one-sided inverse of .
Do nontrivial means exist?
If there are only two candidates, yes. For example, choose any percentage , and declare the first candidate to be the winner iff they are preferred by percent of the electorate. But otherwise, no:
Theorem (Arrow) Let and be finite sets with and . Then every mean for and is trivial.
I’ve interpreted this as a theorem about voting, and that’s the most common interpretation. But I have a couple of reservations about that.
The first is that Arrow’s Theorem is, in a way, too strong. If you talk to someone in the street about what a fair voting system is, they’ll probably expect things about all voters having an equal voice and all candidates being treated equally. None of this comes into Arrow’s Theorem. It’s qualitative rather than quantitative.
Now, there are interesting quantitative aspects of voting theory, such as Condorcet’s paradox. Suppose that there are three candidates, , and , and three voters, , and , who vote as follows: Most voters put above , so the final outcome should do so too. But also, most voters put above , and most voters put above . So the outcome should put above above above . Zut alors! Clearly the only fair outcome ranks , and equally, an option that’s not available: the outcome has to be a total order.
But this difficulty has nothing to do with Arrow’s Theorem, since the hypotheses there don’t demand any kind of numerical fairness; they don’t demand that majority opinion prevails.
The second reservation is that real voting systems very seldom fit the hypotheses.
The input is unrealistic: as a voter, I’ve never had to place a total order on a list of candidates. Either I’ve simply had to put an X in a box, or I’ve had to write numbers in boxes—but there’s always been the option of leaving some boxes blank. In the latter case, it’s a total order on a subset of the set of candidates. I’m sure there are real situations in which you do have put a total order on the whole set of candidates, but it’s by no means the only possibility available.
The output is also unrealistic. In most political elections, a predetermined number of candidates will be elected, and the rest will not. (Often .) All that’s necessary is a fair way of deciding who the top candidates are. There’s no need to come up with an order relation.
Postscript: voting systems and enriched categories
The problem of finding good voting systems has been studied at great length. For example, a quick browse reveals a bunch of impossibility theorems related to Arrow’s: the Gibbard–Satterthwaite Theorem, the Duggan–Schwartz Theorem, the Holmström Theorem, … . I know nothing about all this. But perhaps there’s something to be gained by thinking about means of structures.
How do I want to be able to vote? Not by imposing a total order. There are probably a few candidates that I want to put near the top, and a few about which I know nothing, and a few I actively oppose and want to put near the bottom. For some pairs of candidates I have an opinion about who is better; for some I’d say they’re about the same; for some, I don’t know enough to compare them. So what I want is to impose a preorder. (A preorder is a reflexive transitive relation; it is a partial order if implies .)
In the definition above of mean or voting system, we used just one key property of total orders: a total order on a set induces a total order on each subset, in a functorial way. Preorders also have this key property. So we can replace the total orders functor by the preorders functor throughout, and ask, again:
Do nontrivial means exist?
As usual, I have to say what ‘trivial’ means. There is for each nonempty subset (which I like to think of as a cabal) a mean , defined as follows. Let , and write Then Call these the trivial means. They’re maybe not that trivial: e.g. the one in which is fair in every way, but in a large population it will very likely put the discrete (antichain) order on the set of candidates, making it useless.
Conjecture 1 For , every mean is trivial.
(If there is an infinite number of voters, there are nontrivial means got by choosing nontrivial filters on ; but we are supposing that is finite.)
Conjecture 2 Conjecture 1 has already been settled.
This stuff is well worked-over!
But this still isn’t really how I want to be able to vote. I don’t only want to be able to preorder the candidates; I want to be able to say how much I prefer this candidate to that one. Visualizing my ordered set as a Hasse diagram, I want to put lengths on the edges.
For example, the UK has a white supremacist party called the British National Party. If there’s ever a BNP candidate in an election that I’m voting in, I want not only to put that candidate at the bottom of the diagram: I want to assign a length of to all the edges going down to that bottom vertex. (And I reckon that if that possibility was available, far-right parties would find much less success in elections. The strong antipathy that most people have towards them would be taken into account by the voting system, which at present only allows us to ignore them.)
So, for each pair of candidates, I assign a non-negative real number , specifying how much I prefer to . That is, I place on the set of candidates a metric in the sense of Lawvere: a function satisfying and .
A metric on a set determines a metric on every subset. So we can define ‘mean’ in the usual way, imitating the case of total orders. And we can ask, yet again:
Do nontrivial means exist?
And the answer is emphatically yes! There’s even one that’s fair, in every sense of the word that I can think of. At the election, each voter chooses a metric on . Writing for the set of metrics on , this gives an element Now define by What could be fairer than that?
Of course, this is a bit tongue-in-cheek. In real elections, not everyone manages to successfully put an X in a box; so it could be argued that it is too much to expect every citizen to specify a Lawvere metric. And you can imagine the candidates standing around the day after the election, saying “OK, the voters have metrized us. Now: who gets to be president?”
But ignoring this…
…the theory here is really about enriched categories. Preorders are categories enriched in the two-element ordered set . Metric spaces (in this generalized sense) are categories enriched in the ordered set . In both cases, some monoidal category has been fixed, and each voter chooses a -category structure on (as object-set). And enriched categories have the key property: a -category structure on a set determines a -category structure on every subset.
So, let be a monoidal category. Fix finite sets (voters) and (candidates).
Given a set , write for the set of -category structures on the object-set . Then is a functor in a natural way.
The diagonal functor is lax (indeed, strict) monoidal, so induces a functor . A -category structure on a set is just an -tuple of -category structures on , so there is an induced natural transformation A mean for and is a natural transformation such that .
For both the examples above—preorders and metric spaces—I used the following method to construct the means. Suppose that we have a lax monoidal functor such that . Then, just as the functor induced the natural transformation , the functor induces a natural transformation . And functoriality gives : so is a mean.
When , such an is a filter on (like an ultrafilter, but without the axiom that every subset is either large or small). For example, every subset determines a filter : a subset is large (i.e. ) iff . The resulting mean is the ‘cabal’ mean defined above.
(Using finiteness of , you can show that these are the only filters on . But that doesn’t close off the possibility that there are other means on , not arising in this way. That’s Conjecture 1.)
When , such an is an order-preserving function satisfying and . The resulting mean is given by where (). Among these s are those for which the inequality is an equality, and they are precisely the weighted means in the usual real-number sense: functions of the form where and . And among these is the standard mean, with . This gives the fair voting system described above.
I have no idea how you might take the mean of sets, or of vector spaces, or of objects of other popular enriching categories .
Re: Means
Two unrelated comments:
(1) I don’t have the link, but I think there is some related discussion a while back over on Tao’s blog comparing Arrow’s theorem to (non)existence of nontrivial ultrafilters.
(2) Another concern when designing a voting system is whether it creates an incentive for voters to misrepresent their opinions. For example, in your metric voting, where the computer takes the mean, any party that I have ranked infinitely lower than another party will not win. If I am a partisan, I might reasonably decide that I’d rather no party win, and rank everyone except my party infinitely low, with the understanding that someone else will do the same for their party. This leads to a situation, if I’m understanding your algorithm correctly, in which there is no government. Note that if someone else does as I say, then I should as well, and conversely if no one else does, then I might as well, because it makes my party win. So voters are forced to play a Prisoners’ Dilemma game. An example of where parties make such decisions is the US Congress.