Where Do Probability Measures Come From?
Posted by Tom Leinster
Guest post by Tom Avery
Tom (here Tom means me, not him — Tom) has written several times about a piece of categorical machinery that, when given an appropriate input, churns out some well-known mathematical concepts. This machine is the process of constructing the codensity monad of a functor.
In this post, I’ll give another example of a well-known concept that arises as a codensity monad; namely probability measures. This is something that I’ve just written a paper about.
The Giry monads
Write for the category of measurable spaces (sets equipped with a -algebra of subsets) and measurable maps. I’ll also write for the unit interval , equipped with the Borel -algebra.
Let . There are lots of different probability measures we can put on ; write for the set of all of them.
Is a measurable space? Yes: An element of is a function that sends measurable subsets of to numbers in . Turning this around, we have, for each measurable , an evaluation map . Let’s give the smallest -algebra such that all of these are measurable.
Is a functor? Yes: Given a measurable map and , we can define the pushforward of along by
for measurable .
Is a monad? Yes: Given we can define by
where is a measurable subset of and is its characteristic function. In other words is the Dirac measure at . Given , let
for measurable , where is as above.
This is the Giry monad , first defined (unsurprisingly) by Giry in “A categorical approach to probability theory”.
A finitely additive probability measure is just like a probability measure, except that it is only well-behaved with respect to finite disjoint unions, rather than arbitrary countable disjoint unions. More precisely, rather than having
for disjoint , we just have
for disjoint .
We could repeat the definition of the Giry monad with “probability measure” replaced by “finitely additive probability measure”; doing so would give the finitely additive Giry monad . Every probability measure is a finitely additive probability measure, but not all finitely additive probability measures are probability measures. So is a proper submonad of .
The Kleisli category of is quite interesting. Its objects are just the measurable spaces, and the morphisms are a kind of non-deterministic map called a Markov kernel or conditional probability distribution. As a special case, a discrete space equipped with an endomorphism in the Kleisli category is a discrete-time Markov chain.
I’ll explain how the Giry monads arise as codensity monads, but first I’d like to mention a connection with another example of a codensity monad; namely the ultrafilter monad.
An ultrafilter on a set is a set of subsets of satisfying some properties. So is a subset of the powerset of , and is therefore determined by its characteristic function, which takes values in . In other words, an ultrafilter on can be thought of as a special function
It turns out that “special function” here means “finitely additive probability measure defined on all of and taking values in ”.
So the ultrafilter monad on (which sends a set to the set of ultrafilters on it) is a primitive version of the finitely additive Giry monad. With this in mind, and given the fact that the ultrafilter monad is the codensity monad of the inclusion of the category of finite sets into the category of sets, it is not that surprising that the Giry monads are also codensity monads. In particular, we might expect to be the codensity monad of some functor involving spaces that are “finite” in some sense, and for we’ll need to include some information pertaining to countable additivity.
Integration operators
If you have a measure on a space then you can integrate functions on that space. The converse is also true: if you have a way of integrating functions on a space then you can extract a measure.
There are various ways of making this precise, the most famous of which is the Riesz-Markov-Kakutani Representation Theorem:
Theorem. Let be a compact Hausdorff space. Then the space of finite, signed Borel measures on is canonically isomorphic to
as a normed vector space, where is the category of topological spaces, and is the category of normed vector spaces.
Given a finite, signed Borel measure on , the corresponding map sends a function to its integral with respect to . There are various different versions of this theorem that go by the same name.
My paper contains the following more modest version, which is a correction of a claim by Sturtz.
Proposition. Finitely additive probability measures on a measurable space are canonically in bijection with functions that are
- affine: if and then
and
- weakly averaging: if denotes the constant function with value then .
Call such a function a finitely additive integration operator. The bijection restricts to a correspondence between (countably additive) probability measures and functions that additionally
- respect limits: if is a sequence of functions converging pointwise to then converges to .
Call such a function an integration operator. The integration operator corresponding to a probability measure sends a function to
which justifies the name. In the other direction, given an integration operator , the value of the corresponding probability measure on a measurable set is .
These bijections are measurable (with respect to a natural -algebra on the set of finitely additive integration operators) and natural in , so they define isomorphisms of endofunctors of . Hence we can transfer the monad structures across the isomorphisms, and obtain descriptions of the Giry monads in terms of integration operators.
The Giry monads via codensity monads
So far so good. But what does this have to do with codensity monads? First let’s recall the definition of a codensity monad. I won’t go into a great deal of detail; for more information see Tom’s first post on the topic.
Let be a functor. The codensity monad of is the right Kan extension of along itself. This consists of a functor satisfying a universal property, which equips with a canonical monad structure. The codensity monad doesn’t always exist, but it will whenever is small and is complete. You can think of as a generalisation of the monad induced by the adjunction between and its left adjoint that makes sense when the left adjoint doesn’t exist. In particular, when the left adjoint does exist, the two monads coincide.
The end formula for right Kan extensions gives
where denotes the power of in , i.e. the product of (a set) copies of (an object of ) in .
It doesn’t matter too much if you’re not familiar with ends because we can give an explicit description of in the case that : The elements of are families of functions
that are natural in . For each and measurable we have mapping to . The -algebra on is the smallest such that each of these maps is measurable.
All that’s left is to say what we should choose and to be in order to get the Giry monads.
A subset of a real vector space is convex if for any and the convex combination is also in , and a map between convex sets is called affine if it preserves convex combinations. So there’s a category of convex sets and affine maps between them. We will be interested in certain full subcategories of this.
Let be the (convex) set of sequences in that converge to (it is a subset of the vector space of all real sequences converging to ). Now we can define the categories of interest:
Let be the category whose objects are all finite powers of , with all affine maps between them.
Let be the category whose objects are all finite powers of , together with , and all affine maps between them.
All the objects of and can be considered as measurable spaces (as subspaces of powers of ), and all the affine maps between them are then measurable, so we have (faithful but not full) inclusions and .
Theorem. The codensity monad of is the finitely additive Giry monad, and the codensity monad of is the Giry monad.
Why should this be true? Let’s start with . An element of is a family of functions
But a map into is determined by its composites with the projections to , and these projections are affine. This means that is completely determined by , and the other components are obtained by applying separately in each coordinate. In other words, an element of is a special sort of function
Look familiar? As you might guess, the functions with the above domain and codomain that define elements of are precisely the finitely additive integration operators.
The affine and weakly averaging properties of are enforced by naturality with respect to certain affine maps. For example, the naturality square involving the affine map
(where are the projections) forces to preserve convex combinations of the form . The weakly averaging condition comes from naturality with respect to constant maps.
How is the situation different for ? As before is determined by , and is obtained by applying in each coordinate, thanks to naturality with respect to the projections. A measurable map is a sequence of maps converging pointwise to , and
But , so must converge to . So is an integration operator!
The rest of the proof consists of checking that these assignments really do define isomorphisms of monads.
It’s natural to wonder how much you can alter the categories and without changing the codensity monads. Here’s a result to that effect:
Proposition. The categories and can be replaced by the monoids of affine endomorphisms of and respectively (regarded as 1-object categories, with the evident functors to ) without changing the codensity monads.
This gives categories of convex sets that are minimal such that their inclusions into give rise to the Giry monads. Here I mean minimal in the sense that they contain the fewest objects with all affine maps between them. They are not uniquely minimal; there are other convex sets whose monoids of affine endomorphisms also give rise to the Giry monads.
This result gives yet another characterisation of (finitely and countably) additive probability measures: a probability measure on is an -set morphism
where is the monoid of affine endomorphisms of . Similarly for finitely additive probability measures, with replaced by .
What about maximal categories of convex sets giving rise to the Giry monads? I don’t have a definitive answer to this question, but you can at least throw in all bounded, convex subsets of Euclidean space:
Proposition. Let be the category of all bounded, convex subsets of (where varies) and affine maps. Let be but with adjoined. Then replacing by and by does not change the codensity monads.
The definition of is a bit unsatisfying; feels (and literally is) tacked on. It would be nice to have a characterisation of all the subsets of (or indeed all the convex sets) that can be included in . But so far I haven’t found one.
Re: Where Do Probability Measures Come From?