Skip to the Main Content

Note:These pages make extensive use of the latest XHTML and CSS Standards. They ought to look great in any standards-compliant modern browser. Unfortunately, they will probably look horrible in older browsers, like Netscape 4.x and IE 4.x. Moreover, many posts use MathML, which is, currently only supported in Mozilla. My best suggestion (and you will thank me when surfing an ever-increasing number of sites on the web which have been crafted to use the new standards) is to upgrade to the latest version of your browser. If that's not possible, consider moving to the Standards-compliant and open-source Mozilla browser.

July 19, 2022

Probability Monads as Codensity Monads

Posted by Tom Leinster

My PhD student Ruben Van Belle has just published his first paper!

Ruben Van Belle, Probability monads as codensity monads. Theory and Applications of Categories 38 (2022), 811–842.

It’s a treasure trove of theorems demonstrating how many of the monads loosely referred to as “probability monads” arise as codensity monads in a certain uniform manner, which I’ll tell you about now.

A probability monad is a monad that takes a space XX as input and produces as output the space of probability measures on XX. It’s an imprecise term, but there are a number of monads that are universally agreed to count as probability monads, of which the Giry monad is the best known. There are also some boundary cases: should the ultrafilter monad count as a probability monad? (An ultrafilter is a kind of primitive probability measure.)

The codensity monad of a functor G:BAG: \mathbf{B} \to \mathbf{A} is a certain canonical monad on A\mathbf{A}. If GG has a left adjoint FF, then the codensity monad of GG is the composite GFG F. But even if GG doesn’t have a left adjoint, its codensity monad is often well-defined. By definition, it’s the right Kan extension of GG along itself. A good non-trivial example: the codensity monad of the inclusion FinSetSet\mathbf{FinSet} \hookrightarrow \mathbf{Set} is the ultrafilter monad.

Ruben’s insight was as follows. There is something fundamentally countable about probability theory. For instance, probability is additive on countable families of mutually exclusive events, but not general families. You can define a probability measure on a countable set XX as an XX-indexed family of nonnegative reals summing to 11, but that definition doesn’t work when XX is larger. And — more nebulously — although probability theory very often involves uncountable spaces such as n\mathbb{R}^n and uncountable families of random variables (arising in stochastic processes), there is a feeling that the heart of probability theory remains countable, that the extension from countable to uncountable infinities is somehow automatic.

So, take some category of spaces A\mathbf{A} and a probability monad TT on it. Let A c\mathbf{A}_c be the subcategory of A\mathbf{A} consisting of just the countable spaces. Then you can restrict TT to A c\mathbf{A}_c to get a functor G:A cAG: \mathbf{A}_c \to \mathbf{A}. If the idea expounded in the last paragraph is correct, then it should be possible to reconstruct TT from GG (its restriction to countable spaces) in some automatic way. The substance of the definition of TT should be in the countable part A c\mathbf{A}_c of A\mathbf{A}; everything else should be fluff.

Ruben shows that this is indeed the case. And as you’ll probably have guessed, the way that TT is reconstructed from GG is as its codensity monad. That is, in many cases:

A probability monad is the codensity monad of its restriction to countable spaces.

Sometimes it’s even simpler: you can get away with just finite spaces. Sometimes there are some complicating wrinkles. But that’s the main idea.

Here are four theorems illustrating this principle, all from Ruben’s paper.

  • Let A\mathbf{A} be the category of measurable spaces and TT the Giry monad on A\mathbf{A}, which assigns to each space XX the space T(X)T(X) of probability measures on it. Then TT is the codensity monad of the functor from countable sets to A\mathbf{A} that assigns to each countable set the space of probability measures on it.

    This theorem embodies the idea that once you have the definition of probability measure on a countable set, the definition for general measurable spaces is automatic.

  • Let A\mathbf{A} be the category of compact Hausdorff spaces and TT the Radon monad on A\mathbf{A}, which assigns to each space XX the space T(X)T(X) of Radon probability measures on it. (A Radon measure is a suitably nice measure on the Borel sets, i.e. the smallest σ\sigma-algebra containing the open sets.) Then TT is the codensity monad of the functor GG from finite sets to A\mathbf{A} that assigns to each finite set XX the space of probability measures on it.

    So for compact Hausdorff spaces, once you’ve got the definition of probability measure on finite sets, the general definition of Radon probability measure follows automatically. A striking feature of this theorem was that although the input functor GG was entirely finite in nature, the output monad involves countable additivity. We got that for free.

  • Let A\mathbf{A} be the category of compact metric spaces and distance-decreasing maps. Let TT be the “bounded Lipschitz monad” on A\mathbf{A}. It assigns to each space XX the space T(X)T(X) of Radon probability measures on it, with a certain metric called the “bounded Lipschitz metric”. This is closely related to the Wasserstein/Kantorovich metric, and you can find the definition on page 829 of Ruben’s paper. And again, TT is the codensity monad of its restriction to finite sets.

  • Finally, let A\mathbf{A} be the category of Hausdorff spaces. There is a probability monad TT on A\mathbf{A} that Ruben dubs the Baire monad, assigning to each space XX the space of “Baire probability measures” on XX. I won’t give the definition (see page 825 if you want it), but the point is that the general notion of Baire probability measure reduces to the usual notion of probability measure when XX is metric or compact Hausdorff, but still behaves well on the category of Hausdorff spaces as a whole.

    In this case, the result is that TT is the codensity monad of its restriction to a certain category of countable discrete spaces (i.e. countable sets). It’s not the full subcategory; only certain maps are included. But I’ll leave Ruben to explain that; it’s in section 6 of his paper.

All of this provokes a categorical question. Given a monad TT on a category A\mathbf{A} and a subcategory B\mathbf{B} of A\mathbf{A}, when is it the case that TT is the codensity monad of its restriction to B\mathbf{B}? Or more generally, given a monad TT on A\mathbf{A} and a functor G:BAG: \mathbf{B} \to \mathbf{A}, when is it the case that TT is the codensity monad of the composite functor

BGATA? \mathbf{B} \stackrel{G}{\to} \mathbf{A} \stackrel{T}{\to} \mathbf{A}?

We know the answer when TT is the identity monad. For in that case, the question is asking “for which functors GG is the codensity monad of GG the identity?”, and the answer is “when GG is codense”. It’s not too hard to show that in general, TT is the codensity monad of TGT G if and only if the composite functor

BGAA T \mathbf{B} \stackrel{G}{\to} \mathbf{A} \to \mathbf{A}_T

is codense, where AA T\mathbf{A} \to \mathbf{A}_T is the free functor into the Kleisli category of TT. I’m sure Ruben and I aren’t the first to come across this fact, but can anyone point to where it appears in the literature? And does anyone know other applications of it?

Posted at July 19, 2022 9:47 AM UTC

TrackBack URL for this Entry:   https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/3408

8 Comments & 0 Trackbacks

Re: Probability Monads as Codensity Monads

Well done, Ruben!

Is there some abstract way to relate this codensity account to the graded monad account of Fritz and Perrone, as described here at nLab: graded monad?

…taking the left Kan extension of the graded list monad B[Set,Set]\mathbf{B} \mathbb{N} \to [Set,Set] described above results in the usual list monad on SetSet, given by a lax monoidal functor 1[Set,Set]1 \to [Set,Set]. Based on a similar construction on the category of complete metric spaces, (Fritz & Perrone 17) have contructed a monad of Radon probability measures without any appeal to measure theory; the intuitive idea being that a probability measure can be thought of as an idealized version of a finite sample, and spaces of finite samples make up a graded monad. Forgetting the grading by taking the above Kan extension then produces the Kantorovich monad, containing all Radon probability measures of finite first moment. This construction reduces certain problems in measure-theoretic probability to purely combinatorial problems.

Posted by: David Corfield on July 19, 2022 1:40 PM | Permalink | Reply to this

Re: Probability Monads as Codensity Monads

Thank you for your comment.

I don’t immediately see a way to relate these two constructions. In the construction of the Kantorovich monad of Fritz and Perrone, the space of measures is constructed as a colimit of finitely supported measures and relies on using complete metric spaces. In the codensity construction the space of measures is formed as a limit of countably supported measures, but doesn’t need completeness of the spaces.

Posted by: Ruben Van Belle on July 25, 2022 11:46 AM | Permalink | Reply to this

Re: Probability Monads as Codensity Monads

In the case of the inclusion ι:Meas fMeas\iota:\mathbf{Meas}_f \rightarrow \mathbf{Meas} the right Kan extension Ran ι(𝒢ι)Ran_{\iota}(\mathcal{G} \circ\iota) is, as you know, is just the natural transformation μ:𝒢 2𝒢\mu:\mathcal{G}^2 \Rightarrow \mathcal{G} restricted to the subcategory Meas f\mathbf{Meas}_f. Tacking on another 𝒢\mathcal{G} to ι\iota doesn’t change anything - the universal arrow Ran ι(𝒢ι)ιRan_{\iota}(\mathcal{G} \circ \iota) \Rightarrow \iota still factors through the multiplication natural tranformation μ\mu. I don’t think replacing the inclusion map ι\iota by GG changes the answer - does that universal arrow factor through μ\mu?

I agree that the essence of the Giry monad (and integration) is countability which is the beauty of what Ruben’s result implies. That is because the fundamental Giry algebra is the measurable function ϵ N:𝒢(N)N\epsilon_{N}: \mathcal{G}(N) \rightarrow N, where NN is the set of natural numbers with the powerset σ\sigma-algebra, specified by iNp iδ imin i{i|p i>0} \sum_{i \in N} p_i \delta_i \mapsto min_i \{i | p_i \gt 0\} which is a countably affine function. When I say it is the fundamental σ\sigma-algebra I am saying the two object NN is, if you restrict to standard measurable spaces which have a σ\sigma-algebra specified by σ(F)\sigma(F) where FF is a field that has a countably basis, then the subcategory consisting of NN and all monotonic functions is codense in standard measurable spaces. (It’s hard to say anything about Meas\mathbf{Meas} because getting a codense functor into Meas\mathbf{Meas} is nontrivial.)

Moreover, both those objects 𝒢(N)\mathcal{G}(N) and NN have a super convex space structure, and the full subcategory of super convex space specified by the single object 𝒢(N)\mathcal{G}(N) is dense in super convex spaces. (That is actually trivial because super convex spaces are defined using 𝒢(N)\mathcal{G}(N). (See the nLab page.)) To find all the algebras requires dropping down several layers into the category of super convex spaces - one first seemingly has to drop down to the subcategory where those two objects, NN and 𝒢(N)\mathcal{G}(N), viewed as super convex spaces, are adequate (dense and codense).

OK. My apologies for the long comment. But I found Ruben’s work very exciting also.

Posted by: Kirk Sturtz on July 19, 2022 2:02 PM | Permalink | Reply to this

Re: Probability Monads as Codensity Monads

Thank you for your comment.

Yes, extending GG along the inclusion of countable measurable spaces in measurable spaces would also give the Giry monad. This is essentially because the measurable maps XAX\to A, for some measurable space XX and a countable set AA, contain enough information about the measurable space. However, for topological spaces this is not the case anymore, since continuous maps XAX\to A only detect clopen subsets of XX.

Instead of using a subcategory of countable spaces, we could also use a subcategory with \mathbb{N} as the only object and the codensity construction would still work. This implies that \mathbb{N} is codense in the Kleisli category of the probability monad. In the case of the Radon monad the full category with 3, the discrete three element space, as only object would even be enough and therefore this object is codense in the Kleisli category of the Radon monad.

I would be interested to hear more about the condition that \mathbb{N} and 𝒢()\mathcal{G}(\mathbb{N}) need to be dense and codense in the category of algebras of the Giry monad. Could you explain this more?

Posted by: Ruben Van Belle on July 25, 2022 12:42 PM | Permalink | Reply to this

Re: Probability Monads as Codensity Monads

Ah, I like your explanation. I should be thinking in terms of a codense functor into the Kleisi category to get the codensity monad 𝔾\mathbb{G}. I wasn’t aware that the discrete space 3 was codense for the Radon monad. Nice!

The short answer to why the inclusion functor ι\iota of the full subcategory of super convex spaces, SCvx\mathbf{SCvx}, consisting of the two objects is \mathbb{N} and Δ \Delta_{\mathbb{N}} and denoted Ω\Omega, needs to be codense is because to be able to factorize the Giry monad through a subcategory of SCvx\mathbf{SCvx}, we need the right Kan extension of jΣ:ΩStd 2Stdj \circ \mathbf{\Sigma}': \Omega \rightarrow \mathbf{Std}_2 \hookrightarrow \mathbf{Std}, where Σ:ΩStd 2\mathbf{\Sigma}': \Omega \rightarrow \mathbf{Std}_2 just views those two objects as measurable spaces, along ι\iota to give us the underlying set of a super convex space endowed with a σ\sigma-algebra (the initial σ\sigma-algebra) such that all the countably affine maps into those objects become measurable. If we only require countably affine maps into \mathbb{N} to be measurable then Δ \Delta_{\mathbb{N}} collapses to the measurable space which is isomorphic to \mathbb{N}, and hence the counit ϵ:Σ𝒫1\epsilon:\mathbf{\Sigma} \circ \mathcal{P} \Rightarrow \mathbf{1} for Δ \Delta_{\mathbb{N}} will not yield a barycenter map. (The right Kan extension can also be characterized in terms of ideals - see the nLab page for super convex spaces to get the basic idea.)

The reason we need the algebras for Std\mathbf{Std} is because there the algebras need only be measurable - as opposed to simply continuous which one gets with the Radon monad, and any probability monad on a topological space. For applications, such as quantum computation the discrete space (and their quotients) such as ϵ 2:𝒢(2)2\epsilon_2: \mathcal{G}(\mathbf{2}) \rightarrow \mathbf{2} are necessary. The recent research by Sam Staton and Ned Summers addressed the issue of quantum computation using locally convex compact Hausdorff spaces for which ϵ 2\epsilon_2 is not an algebra. But I suspect that locally convex compact Hausdorff spaces are in fact super convex spaces. (The caveat here is that the space \mathbb{C} needs to be replaced by the one point compactification \mathbb{C}_{\infty}.)

Posted by: Kirk Sturtz on July 26, 2022 4:21 AM | Permalink | Reply to this

Re: Probability Monads as Codensity Monads

Oops. The counit ϵ\epsilon should read 𝒫Σone\mathcal{P} \circ \mathbf{\Sigma} \Rightarrow \mathbf{one}.

Basically, that right Kan extension Σ\mathbf{\Sigma} yields non-separated measurable spaces for all super convex spaces which are a quotient space of a free space. The example obtained by taking the coequalizer of two points on the unit interval gives a nice example of that.

Posted by: Kirk Sturtz on July 26, 2022 5:10 AM | Permalink | Reply to this

Re: Probability Monads as Codensity Monads

The condition of being codense is NOT necessary or desirable! That would cut down the category to only free objects. An example is given by taking the coequalizer of the two distinct interior points on the unit interval to get the discrete space A={0,1,u}A=\{0,1,u\} with three points. There are no countably affine maps from AA into either \mathbb{N} or into 𝒢()\mathcal{G}(\mathbb{N}) which can separate 00 and 11.

Posted by: Kirk Sturtz on July 27, 2022 3:35 AM | Permalink | Reply to this

Re: Probability Monads as Codensity Monads

Incidentally, I wrote:

should the ultrafilter monad count as a probability monad? (An ultrafilter is a kind of primitive probability measure.)

Although in many ways I do like to think of the ultrafilter monad as a probability monad, here’s a reason not to: failure of Fubini.

I’ll explain what I mean. Let 𝒰\mathcal{U} be an ultrafilter on a set XX. For any finite set SS and function f:XSf: X \to S, there is a unique element sSs \in S such that f 1(s)𝒰f^{-1}(s) \in \mathcal{U}. This element ss deserves to be called Xfd𝒰\int_X f \, d\mathcal{U}. After all, if we’re thinking of 𝒰\mathcal{U} as something like a probability measure on XX, then Xfd𝒰\int_X f \,d\mathcal{U} should be the expected value of ff with respect to 𝒰\mathcal{U}. But f(x)=sf(x) = s for almost all xXx \in X, so Xfd𝒰=s\int_X f \, d\mathcal{U} = s.

By “failure of Fubini”, I mean that for ultrafilters 𝒰\mathcal{U} on XX and 𝒱\mathcal{V} on YY, and a function ff from X×YX \times Y to a finite set SS, it can happen that

Y Xf(x,y)d𝒰(x)d𝒱(y) X Yf(x,y)d𝒱(y)d𝒰(x). \int_Y \int_X f(x, y) \, d\mathcal{U}(x) \, d\mathcal{V}(y) \neq \int_X \int_Y f(x, y) \, d\mathcal{V}(y) \, d\mathcal{U}(x).

A counterexample: let X=Y=X = Y = \mathbb{N}, let 𝒰=𝒱\mathcal{U} = \mathcal{V} be a nonprincipal ultrafilter on \mathbb{N}, and let ff be the characteristic function of the subset {(x,y):xy}\{(x, y): x \leq y\} of ×\mathbb{N} \times \mathbb{N}.

I guess this failure of Fubini is equivalent to the statement that the ultrafilter monad is not commutative, but I haven’t thought it through… is that right?

Posted by: Tom Leinster on July 25, 2022 12:33 PM | Permalink | Reply to this

Post a New Comment