Where Do Probability Measures Come From?

October 22, 2014

Where Do Probability Measures Come From?

Posted by Tom Leinster

$MathML-enabled post (click for more details).$

Guest post by Tom Avery

Tom (here Tom means me, not him — Tom) has written several times about a piece of categorical machinery that, when given an appropriate input, churns out some well-known mathematical concepts. This machine is the process of constructing the codensity monad of a functor.

In this post, I’ll give another example of a well-known concept that arises as a codensity monad; namely probability measures. This is something that I’ve just written a paper about.

$MathML-enabled post (click for more details).$

The Giry monads

Write $\mathbf{Meas}$ for the category of measurable spaces (sets equipped with a $\sigma$ -algebra of subsets) and measurable maps. I’ll also write $I$ for the unit interval $[0,1]$ , equipped with the Borel $\sigma$ -algebra.

Let $\Omega \in \mathbf{Meas}$ . There are lots of different probability measures we can put on $\Omega$ ; write $G\Omega$ for the set of all of them.

Is $G\Omega$ a measurable space? Yes: An element of $G\Omega$ is a function that sends measurable subsets of $\Omega$ to numbers in $I$ . Turning this around, we have, for each measurable $A \subseteq \Omega$ , an evaluation map $ev_A \colon G\Omega \to I$ . Let’s give $G\Omega$ the smallest $\sigma$ -algebra such that all of these are measurable.

Is $G$ a functor? Yes: Given a measurable map $g \colon \Omega \to \Omega'$ and $\pi \in G\Omega$ , we can define the pushforward $G g(\pi)$ of $\pi$ along $g$ by

$G g(\pi)(A') = \pi(g^{-1} A')$

for measurable $A' \subseteq \Omega'$ .

Is $G$ a monad? Yes: Given $\omega \in \Omega$ we can define $\eta(\omega) \in G\Omega$ by

$\eta(\omega)(A) = \chi_A (\omega)$

where $A$ is a measurable subset of $\Omega$ and $\chi_A$ is its characteristic function. In other words $\eta(\omega)$ is the Dirac measure at $\omega$ . Given $\rho \in G G\Omega$ , let

$\mu(\rho)(A) = \int_{\G\Omega} ev_A \,\mathrm{d}\rho$

for measurable $A \subseteq \Omega$ , where $\ev_A \colon G\Omega \to I$ is as above.

This is the Giry monad $\mathbb{G} = (G,\eta,\mu)$ , first defined (unsurprisingly) by Giry in “A categorical approach to probability theory”.

A finitely additive probability measure $\pi$ is just like a probability measure, except that it is only well-behaved with respect to finite disjoint unions, rather than arbitrary countable disjoint unions. More precisely, rather than having

$\pi\left(\bigcup_{i=1}^{\infty} A_i\right) = \sum_{i=1}^{\infty} \pi(A_i)$

for disjoint $A_i$ , we just have

$\pi\left(\bigcup_{i=1}^{n} A_i\right) = \sum_{i=1}^{n} \pi(A_i)$

for disjoint $A_i$ .

We could repeat the definition of the Giry monad with “probability measure” replaced by “finitely additive probability measure”; doing so would give the finitely additive Giry monad $\mathbb{F} = (F,\eta,\mu)$ . Every probability measure is a finitely additive probability measure, but not all finitely additive probability measures are probability measures. So $\mathbb{G}$ is a proper submonad of $\mathbb{F}$ .

The Kleisli category of $\mathbb{G}$ is quite interesting. Its objects are just the measurable spaces, and the morphisms are a kind of non-deterministic map called a Markov kernel or conditional probability distribution. As a special case, a discrete space equipped with an endomorphism in the Kleisli category is a discrete-time Markov chain.

I’ll explain how the Giry monads arise as codensity monads, but first I’d like to mention a connection with another example of a codensity monad; namely the ultrafilter monad.

An ultrafilter $\mathcal{U}$ on a set $X$ is a set of subsets of $X$ satisfying some properties. So $\mathcal{U}$ is a subset of the powerset $\mathcal{P}X$ of $X$ , and is therefore determined by its characteristic function, which takes values in $\{0,1\} \subseteq I$ . In other words, an ultrafilter on $X$ can be thought of as a special function

$\mathcal{P}X \to I.$

It turns out that “special function” here means “finitely additive probability measure defined on all of $\mathcal{P}X$ and taking values in $\{0,1\}$ ”.

So the ultrafilter monad on $\mathbf{Set}$ (which sends a set to the set of ultrafilters on it) is a primitive version of the finitely additive Giry monad. With this in mind, and given the fact that the ultrafilter monad is the codensity monad of the inclusion of the category of finite sets into the category of sets, it is not that surprising that the Giry monads are also codensity monads. In particular, we might expect $\mathbb{F}$ to be the codensity monad of some functor involving spaces that are “finite” in some sense, and for $\mathbb{G}$ we’ll need to include some information pertaining to countable additivity.

Integration operators

If you have a measure on a space then you can integrate functions on that space. The converse is also true: if you have a way of integrating functions on a space then you can extract a measure.

There are various ways of making this precise, the most famous of which is the Riesz-Markov-Kakutani Representation Theorem:

Theorem. Let $X$ be a compact Hausdorff space. Then the space of finite, signed Borel measures on $X$ is canonically isomorphic to

$\mathbf{NVS}(\mathbf{Top}(X,\mathbb{R}),\mathbb{R})$

as a normed vector space, where $\mathbf{Top}$ is the category of topological spaces, and $\mathbf{NVS}$ is the category of normed vector spaces.

Given a finite, signed Borel measure $\pi$ on $X$ , the corresponding map $\mathbf{Top}(X,\mathbb{R}) \to \mathbb{R}$ sends a function to its integral with respect to $\pi$ . There are various different versions of this theorem that go by the same name.

My paper contains the following more modest version, which is a correction of a claim by Sturtz.

Proposition. Finitely additive probability measures on a measurable space $\Omega$ are canonically in bijection with functions $\phi \colon \mathbf{Meas}(\Omega,I) \to I$ that are

affine: if $f,g \in \mathbf{Meas}(\Omega,I)$ and $r \in I$ then

$\phi(r f + (1-r)g) = r\phi(f) + (1-r)\phi(g),$

and

weakly averaging: if $\bar{r}$ denotes the constant function with value $r$ then $\phi(\bar{r}) = r$ .

Call such a function a finitely additive integration operator. The bijection restricts to a correspondence between (countably additive) probability measures and functions $\phi$ that additionally

respect limits: if $f_n \in \mathbf{Meas}(\Omega,I)$ is a sequence of functions converging pointwise to $0$ then $\phi(f_n)$ converges to $0$ .

Call such a function an integration operator. The integration operator corresponding to a probability measure $\pi$ sends a function $f$ to

$\int_{\Omega}f \mathrm{d}\pi,$

which justifies the name. In the other direction, given an integration operator $\phi$ , the value of the corresponding probability measure on a measurable set $A \subseteq \Omega$ is $\phi(\chi_A)$ .

These bijections are measurable (with respect to a natural $\sigma$ -algebra on the set of finitely additive integration operators) and natural in $\Omega$ , so they define isomorphisms of endofunctors of $\mathbf{Meas}$ . Hence we can transfer the monad structures across the isomorphisms, and obtain descriptions of the Giry monads in terms of integration operators.

The Giry monads via codensity monads

So far so good. But what does this have to do with codensity monads? First let’s recall the definition of a codensity monad. I won’t go into a great deal of detail; for more information see Tom’s first post on the topic.

Let $U \colon \mathbb{C} \to \mathcal{M}$ be a functor. The codensity monad of $U$ is the right Kan extension of $U$ along itself. This consists of a functor $T^U \colon \mathcal{M} \to \mathcal{M}$ satisfying a universal property, which equips $T^U$ with a canonical monad structure. The codensity monad doesn’t always exist, but it will whenever $\mathbb{C}$ is small and $\mathcal{M}$ is complete. You can think of $T^U$ as a generalisation of the monad induced by the adjunction between $U$ and its left adjoint that makes sense when the left adjoint doesn’t exist. In particular, when the left adjoint does exist, the two monads coincide.

The end formula for right Kan extensions gives

$T^U m = \int_{c \in \mathbb{C}} [\mathcal{M}(m,U c),U c],$

where $[\mathcal{M}(m,U c),U c]$ denotes the $\mathcal{M}(m,U c)$ power of $U c$ in $\mathcal{M}$ , i.e. the product of $\mathcal{M}(m,U c)$ (a set) copies of $U c$ (an object of $\mathcal{M}$ ) in $\mathcal{M}$ .

It doesn’t matter too much if you’re not familiar with ends because we can give an explicit description of $T^U m$ in the case that $\mathcal{M} = \mathbf{Meas}$ : The elements of $T^U\Omega$ are families $\alpha$ of functions

$\alpha_c \colon \mathbf{Meas}(\Omega, U c) \to U c$

that are natural in $c \in \mathbb{C}$ . For each $c \in \mathbb{C}$ and measurable $f \colon \Omega \to U c$ we have $\ev_f \colon T^U \Omega \to I$ mapping $\alpha$ to $\alpha_c (f)$ . The $\sigma$ -algebra on $T^U \Omega$ is the smallest such that each of these maps is measurable.

All that’s left is to say what we should choose $\mathbb{C}$ and $U$ to be in order to get the Giry monads.

A subset $c$ of a real vector space $V$ is convex if for any $x,y \in c$ and $r \in I$ the convex combination $r x + (1-r)y$ is also in $c$ , and a map $h \colon c \to c'$ between convex sets is called affine if it preserves convex combinations. So there’s a category of convex sets and affine maps between them. We will be interested in certain full subcategories of this.

Let $d_0$ be the (convex) set of sequences in $I$ that converge to $0$ (it is a subset of the vector space $c_0$ of all real sequences converging to $0$ ). Now we can define the categories of interest:

Let $\mathbb{C}$ be the category whose objects are all finite powers $I^n$ of $I$ , with all affine maps between them.
Let $\mathbb{D}$ be the category whose objects are all finite powers of $I$ , together with $d_0$ , and all affine maps between them.

All the objects of $\mathbb{C}$ and $\mathbb{D}$ can be considered as measurable spaces (as subspaces of powers of $I$ ), and all the affine maps between them are then measurable, so we have (faithful but not full) inclusions $U \colon \mathbb{C} \to \mathbf{Meas}$ and $V \colon \mathbb{D} \to \mathbf{Meas}$ .

Theorem. The codensity monad of $U$ is the finitely additive Giry monad, and the codensity monad of $V$ is the Giry monad.

Why should this be true? Let’s start with $U$ . An element of $T^U \Omega$ is a family of functions

$\alpha_{I^n} \colon\mathbf{Meas}(\Omega,I^n) \to I^n.$

But a map into $I^n$ is determined by its composites with the projections to $I$ , and these projections are affine. This means that $\alpha$ is completely determined by $\alpha_{I}$ , and the other components are obtained by applying $\alpha_{I}$ separately in each coordinate. In other words, an element of $T^U \Omega$ is a special sort of function

$\mathbf{Meas}(\Omega, I) \to I.$

Look familiar? As you might guess, the functions with the above domain and codomain that define elements of $T^U \Omega$ are precisely the finitely additive integration operators.

The affine and weakly averaging properties of $\alpha_{I}$ are enforced by naturality with respect to certain affine maps. For example, the naturality square involving the affine map

$r\pi_1 + (1-r)\pi_2 \colon I^2 \to I$

(where $\pi_i$ are the projections) forces $\alpha_I$ to preserve convex combinations of the form $r f + (1-r)g$ . The weakly averaging condition comes from naturality with respect to constant maps.

How is the situation different for $T^V$ ? As before $\alpha \in T^V \Omega$ is determined by $\alpha_I$ , and $\alpha_{d_0}$ is obtained by applying $\alpha_I$ in each coordinate, thanks to naturality with respect to the projections. A measurable map $f \colon \Omega \to d_0$ is a sequence of maps $f_n \colon \Omega \to I$ converging pointwise to $0$ , and

$\alpha_{d_0}(f) = (\alpha_I(f_i))_{i=1}^{\infty}.$

But $\alpha_{d_0}(f) \in d_0$ , so $\alpha_I(f_i)$ must converge to $0$ . So $\alpha_I$ is an integration operator!

The rest of the proof consists of checking that these assignments $\alpha \mapsto \alpha_{I}$ really do define isomorphisms of monads.

It’s natural to wonder how much you can alter the categories $\mathbb{C}$ and $\mathbb{D}$ without changing the codensity monads. Here’s a result to that effect:

Proposition. The categories $\mathbb{C}$ and $\mathbb{D}$ can be replaced by the monoids of affine endomorphisms of $I^2$ and $d_0$ respectively (regarded as 1-object categories, with the evident functors to $\mathbf{Meas}$ ) without changing the codensity monads.

This gives categories of convex sets that are minimal such that their inclusions into $\mathbf{Meas}$ give rise to the Giry monads. Here I mean minimal in the sense that they contain the fewest objects with all affine maps between them. They are not uniquely minimal; there are other convex sets whose monoids of affine endomorphisms also give rise to the Giry monads.

This result gives yet another characterisation of (finitely and countably) additive probability measures: a probability measure on $\Omega$ is an $\mathrm{End}(d_0)$ -set morphism

$\mathbf{Meas}(\Omega,d_0) \to d_0,$

where $\mathrm{End}(d_0)$ is the monoid of affine endomorphisms of $d_0$ . Similarly for finitely additive probability measures, with $d_0$ replaced by $I^2$ .

What about maximal categories of convex sets giving rise to the Giry monads? I don’t have a definitive answer to this question, but you can at least throw in all bounded, convex subsets of Euclidean space:

Proposition. Let $\mathbb{C}'$ be the category of all bounded, convex subsets of $\mathbb{R}^n$ (where $n$ varies) and affine maps. Let $\mathbb{D}'$ be $\mathbb{C}'$ but with $d_0$ adjoined. Then replacing $\mathbb{C}$ by $\mathbb{C}'$ and $\mathbb{D}$ by $\mathbb{D}'$ does not change the codensity monads.

The definition of $\mathbb{D}'$ is a bit unsatisfying; $d_0$ feels (and literally is) tacked on. It would be nice to have a characterisation of all the subsets of $\mathbb{R}^{\mathbb{N}}$ (or indeed all the convex sets) that can be included in $\mathbb{D}'$ . But so far I haven’t found one.

Posted at October 22, 2014 2:29 PM UTC

TrackBack URL for this Entry: https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/2774

22 Comments & 0 Trackbacks

Re: Where Do Probability Measures Come From?

$MathML-enabled post (click for more details).$

I believe the answer to the question you pose: “What about maximal categories of convex sets giving rise to the Giry monads?” is the category of convex spaces. As you keenly noted, I forgot to add the continuity at zero condition to description of the monad

P

in my paper:( Simply adding the continuity at zero condition (or equivalently, continuity from below or continuity from above) in characterizing the subfunctor of the double dualization monad you obtain countable additivity and that monad is naturally isomorphic to the Giry monad. The arguments in my paper remain unchanged. Hence we have a functor from the category of convex spaces to the category of measurable spaces which I denoted by

\iota

. The monad

P

is the right Kan extension of

i

along itself (Theorem 6.2).

Posted by: kirk sturtz on October 23, 2014 8:12 AM | Permalink | Reply to this

Re: Where Do Probability Measures Come From?

$MathML-enabled post (click for more details).$

It may well be the case that your argument can be adapted by adding the continuity requirement in the definitions of $\mathcal{P}$ and $\iota$ (in the notation of Theorem 6.2 of your paper, one would then need to show that each $\eta(z)$ satisfies this continuity condition, but it’s certainly believable that this could be done similarly to the other properties). As it stands your proof certainly works for the finitely additive Giry monad. Sorry, I should have pointed that out. In that case, the Giry monad is the codensity monad of a functor from the category of all convex spaces to $\mathbf{Meas}$ .

However, your functor $\iota$ is quite different from my functor $V$ . Writing $\mathbf{Cvx}$ for that category of convex spaces, $\iota$ sends a convex space $c$ to the subset of $\mathbf{Cvx}(\mathbf{Cvx}(c,I),I)$ consisting of the affine maps that are additionally weakly averaging (and possibly also limit-preserving?), equipped with a suitable $\sigma$ -algebra.

On the other hand, for $c \in \mathbb{D}$ , $V c$ has the same underlying set as $c$ , with a suitable $\sigma$ -algebra.

So I suppose what I was really asking was “What’s the largest extension $\mathbb{D}'$ of $\mathbb{D}$ equipped with a functor $V' \colon \mathbb{D} \to \mathbf{Meas}$ such the codensity monad of $V'$ is the Giry monad and the evident triangle between $\mathbb{D}, \mathbb{D}'$ and $\mathbf{Meas}$ commutes”.

This is a more difficult question, partly because it’s not even clear what the best way to equip an arbitrary convex space with a $\sigma$ -algebra is.

Posted by: Tom Avery on October 23, 2014 10:37 AM | Permalink | Reply to this

Re: Where Do Probability Measures Come From?

$MathML-enabled post (click for more details).$

If you are thinking of the functor V as being an insertion of a convex vector space into Meas then, if the space is also bounded, you can use the generalized metric given by Lawvere and further developed by his student Meng in her dissertation. However if the space is not bounded the resulting $\sigma$ -algebras are trivial, e.g., using that generalized metric $d(A,B) = -log \, sup_{t \in [0,1]} t$ subject to … see his initial reference to this metric is in his Metric Spaces, Generalized Logic, and Closed Categories paper, or the presentation in Mengs dissertation is much clearer. I can send you a photocopy of that if you do not have access to it - the results in Mengs dissertation are unfortunately not readily available in the open literature. Her dissertation makes some nice connections between the category of convex spaces and the category of measurable spaces. It is an extension of Lawvere’s Generalized Metric Space paper mentioned above. If you are looking at an arbitrary convex space then obtaining an extension of $D \hookrightarrow Meas$ is indeed a difficult question. In a paper by Borger and Kemp they show that the interval $(-\infty,\infty]$ is a cogenerator for the category of convex spaces which gives a separability condition on these functionals which is relevant to obtaining an extension.

With regards to the continuity condition - showing $\eta(z)$ satisfies the continuity condition is straightforward. Indeed, if it was not true then your results would also be in error as our monads on Meas are identical. (Your subfunctor $G \hookrightarrow F$ is my functor $P$ .)

Posted by: kirk sturtz on October 23, 2014 3:24 PM | Permalink | Reply to this

Re: Where Do Probability Measures Come From?

$MathML-enabled post (click for more details).$

Perhaps Meng’s thesis could be hosted somewhere online, if this is not objectionable to her.

Posted by: David Roberts on October 24, 2014 1:07 AM | Permalink | Reply to this

Re: Where Do Probability Measures Come From?

$MathML-enabled post (click for more details).$

Thanks for sending a copy of Meng’s thesis. It looks interesting! I agree it would be good for it to be available online somewhere.

I am happy to believe that it’s straightforward to check the continuity condition for $\eta (z)$ , I just haven’t checked it myself. I’m not sure I agree that my result depends on it though. Since $\eta (z)$ is defined in terms of $\iota$ , and $\iota$ doesn’t appear in my argument, I don’t see how it’s relevant. Of course, I need to show that something defined in terms of $V$ satisfies the continuity condition, and I do (this is also straightforward).

Posted by: Tom Avery on October 24, 2014 9:42 AM | Permalink | Reply to this

Re: Where Do Probability Measures Come From?

$MathML-enabled post (click for more details).$

Kirk emailed Meng and got her permission to post a copy on the nLab. I’ve converted the scan to a djvu, and will upload it. I’m thinking the page metric space would be a start, but not sure where else. Maybe a page for Meng herself?

Posted by: David Roberts on November 6, 2014 3:53 AM | Permalink | Reply to this

Re: Where Do Probability Measures Come From?

$MathML-enabled post (click for more details).$

Interesting! I observe that your category $\mathbb{C}$ is, among other things, a Lawvere theory, and thus the functor $U$ exhibits $I$ as a model of this theory. That makes me wonder: suppose I have an arbitrary Lawvere theory and a functor exhibiting a model of it; is the codensity monad of that functor interesting? Should I already know the answer?

Posted by: Mike Shulman on October 23, 2014 8:45 AM | Permalink | Reply to this

Re: Where Do Probability Measures Come From?

$MathML-enabled post (click for more details).$

An easy calculation for $\mathcal{M} = \mathbf{Set}$ shows that the resulting $T^U : \mathbf{Set} \to \mathbf{Set}$ is given by $T^U (X) \cong [\mathbb{C}, \mathbf{Set}](U^X, U)$ and since $\mathbb{C}$ -algebras are a full subcategory of $[\mathbb{C}, \mathbf{Set}]$ , when $U$ is a $\mathbb{C}$ -algebra, we could equally well write $T^U (X) \cong Hom (U^X, U)$ which looks a lot like a double dualisation!

Indeed, it is a subfunctor of $X \mapsto [[X, U(1)], U(1)]$ , where $U(1)$ is the underlying set of the $\mathbb{C}$ -algebra $U$ . I haven’t checked, but I would guess that $T^U$ is actually a submonad of the “endomorphism” monad of $U(1)$ , and on that basis, I would guess that monad homomorphisms $S \to T^U$ correspond to $S$ -algebra structures on $U$ as a $\mathbb{C}$ -algebra.

Posted by: Zhen Lin on October 23, 2014 10:45 AM | Permalink | Reply to this

Re: Where Do Probability Measures Come From?

$MathML-enabled post (click for more details).$

Nice! This looks very similar to the “Monad of Schwartz distributions” considered by Anders Kock in “Commutative monads as a theory of distributions”, Section 11.

One thing that puzzles me slightly is that Kock requires the original monad (in this case the monad corresponding to $\mathbb{C}$ ) to be commutative before defining the monad of Schwartz distributions, whereas commutativity doesn’t seem to play a role in what you’ve just said. Perhaps commutativity just gives some better properties, or comes from the fact that Kock is considering strong monads on arbitrary Cartesian closed categories rather than just $\mathbf{Set}$ .

Posted by: Tom Avery on October 23, 2014 11:46 AM | Permalink | Reply to this

Re: Where Do Probability Measures Come From?

$MathML-enabled post (click for more details).$

Out of interest, do you know if this Lawvere theory has been studied at all previously? Possibly related, if you replace cubes $I^n$ with simplices $\Delta^n$ you get the opposite of the Lawvere theory for convex spaces, and this still gives rise to the finitely additive Giry monad in the same way. But I hadn’t considered $\mathbb{C}$ itself Lawvere theory.

Posted by: Tom Avery on October 23, 2014 11:08 AM | Permalink | Reply to this

Re: Where Do Probability Measures Come From?

$MathML-enabled post (click for more details).$

I would like to comment on the wider significance of Tom’s paper, as well as my work in this area. Tom’s presentation is beautiful for its simplicity and avoids the symmetric monoidal closed property that I stress in my paper. My use of the SMC property is deliberate as I am pursuing categorical probability theory on more general categories. By stressing the SMC property it becomes readily obvious that (Bayesian) probability theory can be applied to any category with the properties of (1) SMC category, (2) a convex ordered object. The 2nd condition may be unnecessary - yet tbd. This work combined with previous work on Bayesian Machine Learning - which is simply Bayesian probability theory on function spaces (which requires SMC) - shows that it is possible to do probability theory in a very general categorical setting. I should mention Bob Coecke’s & Robert Spekkens work, Picturing classical and quantum Bayesian inference, which showed that for FINITE spaces one has probability theory on SMC categories (I don’t recall the exact details; but they recognized probability theory applied in a more general setting).

In another comment below Tom noted/questioned the role of the commutativity of the Giry monad and Anders Kock’s work. The commutativity of the Giry monad, which amounts simply to Tonelli’s as well as Fubini’s Theorem on double integrals, is not required for probability theory. The theory readily extends to non-commutative probability theory and only requires the SMC property - the Cartesian Closed property is not necessary. For example, IF the category of lattices is SMC (I am not sure of the answer to this question - the category may need to be cut down???) then the theory immediately applies to Lat and one observes non-commutative probability on ortho-complemented lattices used in quantum mechanics. The details of this proposed work need to be filled in, but it provides ample opportunities for anyone interested in categorical probability theory.

Posted by: Kirk Sturtz on October 24, 2014 1:56 PM | Permalink | Reply to this

Re: Where Do Probability Measures Come From?

$MathML-enabled post (click for more details).$

…previous work on Bayesian Machine Learning - which is simply Bayesian probability theory on function spaces

Do you mean here Gaussian processes?

Posted by: David Corfield on October 26, 2014 10:30 AM | Permalink | Reply to this

Re: Where Do Probability Measures Come From?

$MathML-enabled post (click for more details).$

It includes Gaussian Processes (the most practical case) but I do not believe it is limited to Gaussian Processes. The analytic solution for the GP is spelled out categorically in the paper “Bayesian Machine Learning via Category Theory” by Jared Culbertson and myself where the inference map can also be explicitly constructed. A quick glance at the figures related to parametric and nonparametric models in Section 7 suffices to get the idea. One simply needs the function spaces $Y^X$ … the details are (yawn) an implementation of what the ML community already knows - but the categorical characterization shows the general framework for thinking about these issues categorically. In that paper the Kleisi category, which is SMwCC (weakly closed), is sufficient to do Bayesian ML.

Posted by: kirk sturtz on October 26, 2014 11:49 AM | Permalink | Reply to this

Re: Where Do Probability Measures Come From?

$MathML-enabled post (click for more details).$

Ah, I hadn’t put that together that you were the one of the authors of that paper I’d made a note of to read.

I was once much more involved in this area. I had an idea back then that one could think about Gaussian process use in terms of Bayesian information geometry and infinite-dimensional exponential families.

Posted by: David Corfield on October 26, 2014 1:18 PM | Permalink | Reply to this

Re: Where Do Probability Measures Come From?

$MathML-enabled post (click for more details).$

Kirk Sturtz: How does your proposal relate to, say, The Bayesian interpretation of quantum physics ?

Posted by: Bas Spitters on October 26, 2014 3:42 PM | Permalink | Reply to this

Re: Where Do Probability Measures Come From?

$MathML-enabled post (click for more details).$

Provided you have a SMCC to model your equations you are good to go - it is a modeling problem. For modeling QM the underlying SMCC could be taken to be any SMCC - I believe the Rosetta Stone paper goes into detail on these. Your knowledge far outstrips anything I could say here - but SMCC are relevant here. ( $\mathbf{Disclaimer}$ : I want to emphatically state my complete ignorance of QM… but lack of knowledge and skill has never stopped me before.) Let me try to explain a bit further in regards to the QM problem.

The most elementary approach to a Bayesian viewpoint on QM is via Bohmian Mechanics (deBroglie-Bohm Theory/pilot wave Theory). This theory is deterministic if knowledge of the initial configuration state is known. However, as I recall, if one takes a uniform distribution on the initial configuration state then one obtains the identical results that one obtains using the traditional approach that every physicist is taught. From this perspective the Sampling Distribution $\mathcal{S}$ is deterministic while the prior probability $P_H$ requires a stochastic model. (The generic Bayesian model is shown in Figure 2 of the above mentioned paper. Moving to Function Spaces you get Figure 17 - the same “triangle”.) At that point the problem is like any other Bayesian model. All the work here is in constructing the sampling distribution via Schroedingers equation. Of course when the sampling distributions composed with the prior gets nasty MCMC methods are required to compute the inference map.

Posted by: kirk sturtz on October 26, 2014 7:44 PM | Permalink | Reply to this

Re: Where Do Probability Measures Come From?

$MathML-enabled post (click for more details).$

respect limits: if $f_n \in \mathbf{Meas}(\Omega,I)$ is a sequence of functions converging pointwise to $0$ then $\phi(f_n)$ converges to $0$ .

Does it make any difference to require this for all nets and not merely for sequences?

Posted by: Toby Bartels on October 26, 2014 4:02 AM | Permalink | Reply to this

Re: Where Do Probability Measures Come From?

$MathML-enabled post (click for more details).$

Yes, I think so. Requiring this for arbitrary nets amounts to a strengthening of the monotone convergence theorem, which not all probability measures satisfy. The following is a counterexample for nets whose underlying directed set is the first uncountable ordinal $\omega_1$ .

Let $\Omega$ be the set of countable ordinals, that is, $\Omega$ is the underlying set of $\omega_1$ . Equip $\Omega$ with the countable/cocountable $\sigma$ -algebra. Let $\pi$ be the probability measure on $\Omega$ taking value $1$ on cocountable sets and $0$ on countable sets.

For each $\beta \lt \omega_1$ let $f_{\beta}(\gamma) = 0$ if $\gamma \lt \beta$ and $1$ if $\gamma \geq \beta$ .

The integral of each $f_{\beta}$ with respect to $\pi$ is $1$ , since they take the value $1$ almost everywhere.

But their pointwise $\omega_1$ -limit is $0$ .

Posted by: Tom Avery on October 27, 2014 11:56 AM | Permalink | Reply to this

Re: Where Do Probability Measures Come From?

$MathML-enabled post (click for more details).$

This is neat stuff, Tom! I’m particularly struck by the role that convexity seems to play.

It also seems like there’s a lot of room to vary things in this construction:

Have you thought about functors whose codensity monads might yield signed measures, unbounded measures, complex-valued measures, etc? Actually, I don’t even know which of these constructions are monads, let alone codensity monads…
$U$ and $V$ equip $I^n$ with the Borel $\sigma$ -algebra. If you use the Lebesgue $\sigma$ -algebra, instead, what happens to the codensity monad?
It also seems strange to me to consider the finitely-additive monad $\mathbb{F}$ on measurable spaces, simply because a $\sigma$ -algebra satisfies axioms about countable unions and intersections which seem unnecessary for the definition of a finitely-additive measure. Does $\mathbb{F}$ arise as a codensity monad on the category of all algebras (where an algebra is a set equipped with a boolean subalgebra of its powerset)?

Posted by: Tim Campion on October 27, 2014 2:49 AM | Permalink | Reply to this

Re: Where Do Probability Measures Come From?

$MathML-enabled post (click for more details).$

I don’t know which of these constructions are monads, let alone codensity monads…

I guess what you had in mind here when you said “codensity monad” was “codensity monad of some functor similar to the one above”. But in case anyone reading this is misled, I can’t resist pointing out:

Every monad is a codensity monad.

That’s simply because every monad is induced by some adjunction $F \dashv G$ , which is then the codensity monad of $G$ .

I’m beginning to think that the terminology “codensity monad” is too weighty. Perhaps it would be better to speak of the induced monad of a functor, or simply the monad of a functor.

Posted by: Tom Leinster on October 27, 2014 9:52 AM | Permalink | Reply to this

Re: Where Do Probability Measures Come From?

$MathML-enabled post (click for more details).$

Thanks Tim! Interesting questions. I’ve thought/am planning to think a bit about some of these.

Not all of these constructions are monads (at least, not in the way that most obviously generalises the Giry monads). The problem is with the multiplication, which in the Giry monads is defined using integration. But, for example, when you try an analogous definition for finite, positive measures, the integrals that should define the multiplication may be infinite. On the other hand, I think positive, possibly infinite measures do form a monad, because every measurable function taking values in $[0,\infty]$ has an integral in $[0,\infty]$ . I haven’t (yet) thought very hard about whether this monad can be realised as a codensity monad in an interesting way.
Again, interesting question. I don’t know the answer.
I’m fairly sure all the finitely additive stuff should work in the same way for sets equipped with Boolean algebras (but I haven’t checked thoroughly). I mainly used $\sigma$ -algebras for both so that I could describe $\mathbb{G}$ as a submonad of $\mathbb{F}$ , and for brevity.

Posted by: Tom Avery on October 27, 2014 3:03 PM | Permalink | Reply to this

Re: Where Do Probability Measures Come From?

$MathML-enabled post (click for more details).$

I believe Tims question, from my “subfunctor of the double dualization monad” point of view is that the monad $T$ defined on component $X$ by $T(X) = \{ \I^X \stackrel{G}{\longrightarrow} \I \, | \, G \, is \, weakly \, averaging,\, affine, and \, continuous \,at \emptyset \}$ should be able to be generalized to $T(X) = \{ K^X \stackrel{G}{\longrightarrow} K \, | \, G \, is \, weakly \, averaging,\, affine, and \, continuous \,at \emptyset \}.$

where $K$ is replaced by $[0,\infty]$ , $[-1,1]$ , the closed unit disk (viewed in the complex plane),etc. The key point is (seemingly) that $K$ needs to be convex and have a partial order.

Is $T$ still a subfunctor of the double dualization monad? If so, then as Tom L. reiterates, they certainly come from some functor mapping into $Meas$ .

Abstractly, we should be able to call those elements of the monad $T$ (assuming it is a submonad) “ $K$ -valued (probability) measures”.

Posted by: kirk sturtz on October 27, 2014 5:19 PM | Permalink | Reply to this

The n-Category Café

Skip to the Main Content

October 22, 2014