On the Law of Large Numbers (Such As 60)
Posted by Tom Leinster
I spent yesterday morning in the computer science department at Strathclyde, for the 60th birthday celebrations of Peter Hancock. Often these events are months away from the person’s actual birthday, but in this case it was only one day off. Happy birthday, Hank!
One of the talks I went to was by Alex Simpson, the title of which was too good not to use as the title of this post. Simpson’s recent work has re-examined foundational questions about probability and randomness. It’s also the most exciting use of locales I think I’ve ever encountered. I want to tell you something about it here.
Let me begin by making clear that all mistakes here are definitely mine. The chances of mistakes are higher than usual, as I don’t understand this stuff nearly as well as I’d like to. In particular, Simpson’s work has made me realize that my understanding of locales is not as good as I’d like. If you want correct, precise statements, you should read his paper Measure, randomness and sublocales, which covers much of what I’m going to say.
We’ll get to locales in a bit. First, though, a basic question:
What is a random sequence?
If someone tells you that the binary digits of are thought to behave like a random sequence, you know roughly what they mean, even though the binary digits of aren’t random at all. Probably they’re using “random” to include properties such as “there are as many 0s as 1s” or “the time you have to wait before the appearance of a given -digit block is on average the same as it would be for a random sequence” — that kind of thing. There’s some intuitive understanding here, perhaps one that can be made precise in several different ways, but an intuition all the same.
Simpson takes as his model scenario an infinite sequence of fair coin tosses. The set of all sequences of coin tosses is
where is a two-element set. As a coin belonging to a theoretical computer scientist, its faces are labelled and , not heads and tails; so the two elements of will be called and .
We’ll need two pieces of structure on the set .
Topology Give the discrete topology, and give the resulting product topology. The topological space is then called the Cantor set (or more logically, Cantor space). People often think of the Cantor set as embedded in , via the map sending to , but we won’t need that viewpoint here.
Simpson emphasized the following influential idea: in informatic terms, open subsets can be regarded as observable properties. I don’t know much about the history of this idea. Alex told me afterwards that it was made very explicit in papers of Mike Smyth, under the name of “semidecidable property”. The name “observable property” is due to Samson Abramsky, I believe.
The Cantor set provides a good example. Imagine someone tossing a coin repeatedly, while you stand there watching. Because you’re a finite creature, all you can observe is how the sequence of coin-tosses starts. Now, a subset of is open if and only if it has the following property: for all , there exists such that
That’s just how the product topology is defined, but you can see that it also embodies the idea that a set is open if it consists of all sequences satisfying some observable property.
Measure The Cantor set carries a canonical measure. It’s the unique probability measure with the obvious symmetry and self-similarity properties. For example, it gives measure to the set of all sequences beginning . I’ll call this measure .
Aside As far as a measure theorist is concerned, with its measure is the same as with Lebesgue measure. Indeed, there’s a surjection sending a sequence to the real number with binary expansion . It’s not quite injective, because numbers of the form have more than one binary expansion. On the other hand, the set of such numbers has measure zero, and every other real number has just one binary expansion. So from a measure-theoretic point of view, might as well be a bijection. Also, turns the measure on into Lebesgue measure on . For example, the image under of the set of sequences beginning is , which has measure . So you could replace by if you wanted.
Now let’s return to the question:
What is a random sequence?
Here’s Simpson’s bold idea. We think of a sequence as ‘random’ — remember the example of — if it has all ‘generic’ properties. In other words, a sequence is random if it belongs to all subspaces of of measure one. So, he defines the space of random sequences to be the intersection of all measure-one subspaces of .
This is so bold that it’s obviously wrong. No sequence has all generic properties: any given sequence , no matter how ‘random’ it might look, fails to have the generic property of not being equal to . That is, any given sequence fails to belong to the measure-one subspace .
But this doesn’t deter Simpson, because he knows the following: just because a space has no points doesn’t mean it’s trivial.
If you know about locales, you’ll have guessed that they’re how Simpson makes sense of this. If you don’t know about locales, keep reading anyway, because I’ll tell you now.
The idea is childishly simple. In a topological space, the open sets form a partially ordered set with respect to inclusion. This poset has a few properties: e.g. every pair of elements in the poset has a greatest lower bound, since the intersection of two open sets is open. A locale is a poset with these properties.
Why bother? Partly because of applications like the one I’m about to tell you, but also for much more basic reasons. Many mathematicians are unhappy with topological spaces in which the points can’t be well separated by open sets (e.g. non-Hausdorff spaces). There’s an instinct that that’s not in the true spirit of topology. For example, indiscrete spaces are non-separated in the most extreme way. To some extent, locales relieve this problem: e.g. all nonempty indiscrete spaces give rise to the same locale, the two-element totally ordered set.
Although points play no part in the definition of locale, there is a definition of ‘point of a locale’. That is, every locale has an underlying set of points. I won’t tell you how what the definition is, but as a sanity check, if you start with a Hausdorff topological space , the points of the resulting locale are exactly the points of .
Now here’s the crucial thing: there are useful locales with no points at all. For example, there’s a ‘locale of surjections ’. Without saying anything about what that locale is, it’s clear that it can’t have any points.
Simpson’s locale of random sequences, which I’ll define in a moment, is another locale with no points. No individual sequence is random, but there’s a meaningful space of random sequences.
By definition, is the intersection of all measure-one open sublocales of . It’s also equal to the intersection of all measure-one not-necessarily-open sublocales of . To understand those statements, you evidently need to understand what ‘sublocale’ means, but you can appreciate the idea anyway: a sequence is random if it’s contained in every measure-one subspace.
Another way to say it is that
Here is the poset of open subsets of , and the equivalence relation is given by if and only if . Remember that is the canonical measure on . Also, I’m using to denote symmetric difference. So this is a standard idea in measure theory: identify two sets if their symmetric difference has measure zero.
I still haven’t got to the Law of Large Numbers — henceforth, the LLN.
For a sequence of zeros and ones, the LLN says that
Do all random sequences satisfy the LLN? In other words, is the set of sequences satisfying the LLN the whole of the locale ?
Well… we have a problem. The set of sequences satisfying the LLN is some subset of , and not even an open subset at that. It doesn’t make sense to ask whether it’s the whole of . Quite simply, and are objects of different types. So how do we begin to compare them?
Much of Simpson’s talk was about this. He took advantage of the fact that is a Borel set in . Recall that in any topological space, the class of Borel sets is, by definition, the smallest class of subsets containing all the open sets and closed under complements and countable unions. The Borel sets are, therefore, also closed under countable intersections. Now, unwinding the definition of limit, the set of sequences satisfying the law of large numbers is
This doesn’t quite look like a Borel set, because the first intersection is over an uncountable collection of s. On the other hand, it makes no difference if we restrict to lie in the countable set , say. So is Borel.
I said earlier that points play no part in the definition of locale, but you can still define ‘point of a locale’. Similarly, Borel sets play no part in the definition of locale, but you can still define ‘Borel set in a locale’. In particular, this gives the notion of ‘Borel set in ’. There’s a particular Borel set in consisting of all the sequences satisfying the LLN. I’ll call that Borel set , too.
The question is, then: is ?
Simpson stated an informal version of this question near the beginning of his talk, and spent most of the middle part setting up the language to state it precisely. It was only in the last few minutes that he answered it. Like a Hollywood movie whose ending is obvious to everyone in the cinema, it was clear that he wouldn’t have bothered doing all this if the answer was no. So a ripple of surprise went round the room when he told us that the answer is, in fact, no.
What’s going on? The moral, he said, is that the LLN is not derivable from observable properties of randomness. You can only observe a sequence of coin tosses for a finite amount of time, and that tells you nothing about its long-term behaviour. You can’t gather any evidence about whether the sequence will satisfy the LLN. This relates to the fact that the open sets in , which make up our locale, correspond to the observable properties of sequences.
(Actually, when he told us the ‘moral’, he mis-spoke and called it the ‘morel’, to rhyme with Borel. This is, as someone in the audience pointed out, a kind of mushroom.)
The very end of the talk was a precise statement of the following slogan: LLN is ‘true’ in the sense that it’s consistent with observable facts. As I understand it, this is the strongest positive statement that could possibly be made once you’ve understood the limited power of finite-time observation. But for the details of that, you might have to wait for Alex’s next paper.
Re: On the Law of Large Numbers (Such As 60)
For no good reason at all, I was looking at this interesting webpage which includes, among other things, the alarming pieces of Haskel
…
and
… The internet rhymes, and that is nifty.