Skip to the Main Content

Note:These pages make extensive use of the latest XHTML and CSS Standards. They ought to look great in any standards-compliant modern browser. Unfortunately, they will probably look horrible in older browsers, like Netscape 4.x and IE 4.x. Moreover, many posts use MathML, which is, currently only supported in Mozilla. My best suggestion (and you will thank me when surfing an ever-increasing number of sites on the web which have been crafted to use the new standards) is to upgrade to the latest version of your browser. If that's not possible, consider moving to the Standards-compliant and open-source Mozilla browser.

March 24, 2019

The Kantorovich Monad

Posted by Simon Willerton

guest post by Paolo Perrone

On this blog there has been quite a number of posts about the interaction of category theory and probability (for example here, here, here, and here), as well as about applying category theory to the study of convex structures (for example here, here, and here).

One of the central constructions in categorical probability theory is that of probability monad. A probability monad can be thought of as a way to extend spaces in order for them to include, besides their elements, also their random elements.

euro_monad.png

Here I would like to talk about a probability monad on the category of metric spaces: the Kantorovich monad. As some of you may suspect, this is related to Kantorovich duality, which appeared on this blog in the context of enriched profunctors. The Kantorovich monad was introduced by Franck van Breugel for compact metric spaces in his note The Metric Monad for Probabilistic Nondeterminism (pdf). In the work that resulted in my PhD thesis (pdf), Tobias Fritz and I extended the construction, and discovered some very particular properties of this monad. In particular, this monad can be described purely in terms of combinatorics of finite sequences of elements! Most of the work explained in this post can be found in the paper by me and Tobias:

A quick introduction to monads

If you are already familiar with monads, you may skip this section. For everybody else, a monad on a category CC is one of the most important constructions in category theory. It consists of the following (subject to axioms you can find at the above link):

  • an endofunctor T:CCT\colon C\to C on our category;
  • a natural transformation μ:TTT\mu\colon T T\to T, called the multiplication of the monad;
  • a natural transformation η:1T\eta\colon 1\to T from the identity on CC to TT, called the unit of the monad.

A way to interpret and motivate the construction is the idea that a monad is like a consistent way of extending spaces to include generalized elements of a specific kind.

  • The endofunctor T:CCT\colon C\to C takes a space XX and gives a new space TXT X which can be thought of as an extension of XX.
  • The unit natural transformation η:1T\eta\colon 1\to T gives embeddings XTXX\to T X from the old spaces XX into their extensions TXT X. In other words, the new extended spaces have to contain the old ones.
  • The multiplication natural transformation μ:TTT\mu\colon T T\to T takes a extended extended space (twice) and maps it back to a one-time extended space.

As an example, take the power set monad on the category of sets.

  • The endofunctor TT takes a set XX and gives the set TXT X of its subsets. Subsets can be considered a generalization of elements.
  • The unit gives maps XTXX\to T X which assign to xXx\in X the singleton {x}TX\{x\}\in T X. In this sense a subset is more general than an element: the elements correspond precisely to the subsets which are singletons.
  • The multiplication μ:TTT\mu\colon T T\to T maps subsets of subsets of XX into their union, which is a subset of XX. For example,

{{x,y},{x,z}}{x,y,z}. \{\{x,y\}, \{x,z\}\} \;\mapsto\; \{x,y,z\} .

Another motivation for the theory monads is that a monad is like a consistent choice of spaces of formal expressions of a specific kind. In this interpretation, the endofunctor T:CCT\colon C\to C takes a space XX and gives a new space TXT X which can be thought of as containing formal expressions of a specific kind. One example is that of formal sums. If XX contains elements

x,y,z, x,y,z,\dots

then TXT X contains finite strings such as

x+y+z,x+x,z. x + y + z, \; x + x, \; z.

The unit gives maps XTXX\to T X which assign to each element of xx the trivial formal expression containing only that one element. For example, for the formal sum case, xx is mapped to the trivial formal sum

x x

in which nothing else is added to xx.

The multiplication gives maps from formal sums of formal sums (twice) to formal sums, which reduce the expression by removing the nesting. For example,

(x+y)+(z+t)x+y+z+t. (x + y) + (z + t) \; \mapsto \; x + y + z + t .

So far, all the operations are purely formal. There are spaces where these expressions can actually evaluated. For example, the sum

3+4+5 3+4+5

has the result 99. The spaces where these formal expressions have a result are called the algebras of the monad. In particular, an algebra of the monad TT consists of an object AA together with a map TAAT A\to A which intuitively assigns to an expression its result. Again, the map is required to satisfy some axioms (which say, for example, that the trivial formal expression xx has result xx).

For a more detailed but still nontechnical introduction to monads you can take a look at Chapter 1 of my thesis (pdf).

Probability monads

The idea of a probability monad is that we want to assign to a space XX a new space PXP X containing random elements of XX. This is consistent with the two interpretations given above:

  • random elements can be thought of as a generalization of ordinary elements;
  • random elements can be thought of as formal convex combinations of ordinary elements.

Probability monads were introduced by Michèle Giry in her work A Categorical approach to probability theory (pdf). A possible way of interpreting her work is the following. First of all we consider a “base” category CC, whose objects we think of as spaces of possible outcomes, and whose morphisms we think of as functions between such spaces. For example, the outcomes of a coin flip form a set of two elements, {Heads,Tails}\{Heads, Tails\}. Spaces of outcomes may have additional structure, usually at least a sigma-algebra, or even a topology or metric, and the morphisms of CC are compatible with that extra structure.

The category CC is equipped with a monad (P,E,δ)(P, E, \delta), called probability monad, consisting of an endofunctor PP, and natural transformations E:PPPE\colon P P\to P and δ:1P\delta\colon 1\to P. Let’s see more in detail what these encode.

To each space of outcomes XX, the functor PP assigns a space PXP X of random outcomes. Usually, these are the probability measures on XX,or sometimes valuations. For example, P{Heads,Tails}P \{Heads, Tails\} contains all formal convex combinations, that is formal real linear combinations with the coefficients lying between 00 and 11 and adding to 11. Here are some examples:

  • 0Heads+1Tails0 \,Heads + 1\, Tails,
  • 12Heads+12Tails\frac{1}{2}\, Heads + \frac{1}{2}\, Tails,
  • 1Heads+0Tails1\, Heads + 0\, Tails,
  • 34Heads+14Tails\frac{3}{4}\, Heads + \frac{1}{4}\, Tails.

For each space XX, the unit δ\delta gives a natural map from XX into the space PXP X of random elements. It is usually an embedding, and it maps each (deterministic) outcome xXx\in X to the outcome assigning probability 11 to xx and 00 to everything else. The outcomes in the image of δ\delta are, in other words, those that are not really random. For the example of the coin, we have

  • δ(Heads)=1Heads+0Tails\delta(Heads) = 1\, Heads + 0\, Tails
  • δ(Tails)=0Heads+1Tails\delta(Tails) = 0\, Heads + 1\, Tails.

For each space XX, the multiplication gives a natural map PPXPXP P X \to P X. The space PPXP P X can be thought of as containing laws of random variables whose law is also random, or “random random variables”.

Here is an example. Suppose that I have two different coins in my pocket, one with the usual Heads and Tails sides, and one that has two Heads. These coins have different laws: one has 12\frac{1}{2} chance of Heads and 12\frac{1}{2} chance of Tails, and the other one yields Heads with probability 1.

If I now keep both coins in my pocket and I draw one at random, and then flip it, there is 12\frac{1}{2} probability of extracting the first coin, itself with law (12Heads+12Tails)(\frac{1}{2}\,Heads + \frac{1}{2}\,Tails), and 12\frac{1}{2} probability of extracting the second coin, itself with law (1Heads+0Tails)(1\,Heads + 0\,Tails). We can model the situation as

12(12Heads+12Tails)+12(1Heads+0Tails). \frac{1}{2}\,(\frac{1}{2}\, Heads + \frac{1}{2}\, Tails) + \frac{1}{2}\,(1\, Heads + 0\, Tails).

The space PPXP P X will contain elements of form “formal convex combinations of formal convex combinations” (nested, hence the brackets). Now, as you probably have thought already, the overall probability of obtaining Heads from this situation is 34\frac{3}{4}, and that of Tails is 14\frac{1}{4}. This is a usual random outcome, an element of PXP X. We have obtained it by averaging:

12(12Heads+12Tails)+12(1Heads+0Tails)34Heads+14Tails. \frac{1}{2}\,\bigg(\frac{1}{2}\, Heads + \frac{1}{2}\, Tails\bigg) + \frac{1}{2}\,\bigg(1\, Heads + 0\, Tails\bigg) \; \mapsto \; \frac{3}{4}\, Heads + \frac{1}{4}\, Tails.

This averaging process is the map E:PPXPXE\colon P P X \to P X, the multiplication of the monad.

How this is done in practice may vary depending on the choice of base category, and of monad. In general, the convex combinations may be infinite, that is, expressed by an integral rather than just a sum. This is where most of the analytical difficulties come.

In the example above, we have never actually averaged between Heads and Tails: the space {Heads,Tails}\{Heads, Tails\} is not closed under taking averages, in any sense. However, there are many situations in which the average is not just a formal expression, but it has an actual result. For example, for real numbers, 123+125=4\frac{1}{2} \cdot 3 + \frac{1}{2} \cdot 5 = 4. An algebra of the probability monad PP is precisely a space AA which is equipped with a way of evaluating those formal convex combinations. That amounts to a map e:PAAe\colon P A\to A satisfying the usual laws of algebras of a monad. In other words, algebras of a probability monad tend to look like generalized convex spaces. The morphisms of algebras are then maps which commute with the convex structure, which we can think of as generalized affine maps. Taking averages, or expected values, is one of the most important operations in probability theory. The spaces where this can be done are exactly the algebras of a probability monad.

Again, more motivation for probability monads can be found in Chapter 1 of my thesis (pdf), in case you are interested!

The Kantorovich functor

Following the guidelines above, we now pick as base category the category of complete metric spaces, which we denote as CMetCMet:

  • objects are complete metric spaces;
  • morphisms are short maps, or 1-Lipschitz maps, which are the maps f:XYf\colon X \to Y such that for all x,xXx, x' \in X,

d(f(x),f(x))d(x,x). d(f(x),f(x')) \; \le \; d(x,x') .

You may know this category since, following Lawvere, it can be thought of as a category of enriched categories and functors (see for example this post, or the original paper).

Given a complete metric space XX, we define PXP X to be the space of Radon probability measures on XX with finite first moment. The latter condition means that the average distance is finite: pp has finite first moment if and only if

Xd(x,y)dp(x)dp(y) \int_X d(x,y) \, d p(x)\,d p(y)

is a finite number. Equivalently, this is saying that all short maps XX\to\mathbb{R} are pp-integrable.

The metric that we assign to PXP X is the celebrated Kantorovich metric, also known as Kantorovich-Rubinstein metric, or 1-Wasserstein metric, or earth mover’s distance. The basic idea behind this metric is that it is a sort of convex extension of the metric of XX. If we view the space of probability measures PXP X as an extension of XX in which XX sits embedded via the unit map δ:XPX\delta\colon X\to P X, it is natural to require that the metric on PXP X makes δ\delta an isometric embedding, i.e. for all x,yXx,y\in X,

d(δ(x),δ(y))=d(x,y). d\big(\delta(x),\delta(y)\big) = d(x,y) .

This requirement makes Wasserstein metrics different from other metrics for probability measures (such as the total variational distance), in that point measures over neighboring points are themselves neighboring, even if they have no actual overlap. So Wasserstein metrics keep track nontrivially of the distance and topology of the underlying space. The condition above is of course not enough to determine the metric uniquely, we need to see how the metric works when the measures are nontrivial. Consider three points x,y 1,y 2Xx, y_1, y_2\in X, and the probability measures p=δ(x)p=\delta(x) and q=12δ(y 1)+12δ(y 2)q=\frac{1}{2}\,\delta(y_1) + \frac{1}{2}\,\delta(y_2). We would like d(p,q)d(p,q) to lie between d(x,y 1)d(x,y_1) and d X(x,y 2)d_X(x,y_2). A possible choice is

d(p,q)=12d(x,y 1)+12d(x,y 2). d(p,q) = \frac{1}{2}\,d(x,y_1) + \frac{1}{2}\,d(x,y_2) .

This can be interpreted in the following way: half the mass of pp has to be moved from xx to y 1y_1, and the other half from xx to y 2y_2. Therefore the total cost of transport is

massdistance+massdistance=12d(x,y 1)+12d(x,y 2). mass \cdot distance + mass \cdot distance = \frac{1}{2}\,d(x,y_1) + \frac{1}{2}\,d(x,y_2) .

For this reason, this distance is sometimes called the earth mover’s distance. Another interpretation, in line with the idea of formal convex combinations, would be that the distance between formal convex combinations is the convex combination of the distances. If pp also is nontrivial, for example p=12δ x 1+12δ x 2p=\frac{1}{2}\,\delta_{x_1} + \frac{1}{2}\,\delta_{x_2}, there are at least two possible ways of moving the mass between pp and qq: moving the mass at x 1x_1 to y 1y_1 and the mass at x 2x_2 to y 2y_2, or moving the mass at x 1x_1 to y 2y_2 and the mass at x 2x_2 to y 1y_1, or even a combination of the two. In this case, the distance will be the optimal choice between these possibilities, that is

d(p,q)=min σS 212(d(x 1,y σ(1))+d(x 2,y σ(2))). d(p,q) = \min_{\sigma\in S_2} \frac{1}{2} \big( d(x_1,y_{\sigma(1)}) + d(x_2,y_{\sigma(2)}) \big) .

Since we are optimizing an affine functional and all the possibilities form a convex set, it is sufficient to optimize over the extreme points, which are permutations (in this case, of y 1y_1 and y 2y_2). This procedure, after taking a suitable limit, specifies the metric uniquely. Below, I’ll give a sketch of how this is done in practice.

On a short map f:XYf\colon X\to Y, the functor PP gives the pushforward of measures PXPYP X \to P Y between the spaces of probability measures over XX and YY. The fact that ff is short assures that the pushforward measures will have finite first moment too, and it is easy to check that the map PfP f will be short as well.

Colimit characterization

The Kantorovich functor PP can be expressed as a colimit over spaces of finite sequences. Let’s see how.

Let XX be a complete metric space. Among the measures of PXP X there are some which are particularly simple: let’s call simple measures the ones of the form

p=1n(δ(x 1)++δ(x n)), p = \frac{1}{n} \, \big(\delta(x_1) + \dots + \delta(x_n)\big) ,

for points x 1,,x nXx_1,\dots,x_n\in X. These measures can be thought of as the empirical distributions coming from finite sequences (x 1,,x n)(x_1,\dots,x_n). It is well-known that simple measures are dense in PXP X. In other words, with the Kantorovich metric, we can approximate any probability measure pPXp\in P X by a sequence of simple measures.

Thanks to the choice of our category, we can express this density property in the language of category theory as a colimit. In particular, for each nn\in \mathbb{N}, let X nX^n denote the nn-th cartesian power of XX, equipped with the rescaled metric:

d((x 1,,x n),(y 1,,y n))1n i=1 nd(x i,y i). d\big( (x_1,\dots,x_n) , (y_1,\dots,y_n) \big) \coloneqq \frac{1}{n} \sum_{i=1}^n d(x_i,y_i) .

Let us now take a quotient space of X nX^n with respect to permutations, effectively defining the space of nn-tuples of elements of XX up to reordering. In rigor, we define the space X nX_n to be the quotient space of X nX^n under the action of the symmetric group S nS_n, with the quotient metric:

d((x 1,,x n),(y 1,,y n))min σS n1n i=1 nd(x i,y σ(i)). d\big( (x_1,\dots,x_n) , (y_1,\dots,y_n) \big)\coloneqq\min_{\sigma\in S_n} \frac{1}{n} \sum_{i=1}^n d(x_i,y_{\sigma(i)}) .

The spaces X nX^n and X nX_n are again complete metric spaces, objects of CMetCMet. Now the density result, intuitively, says that PXP X is in some sense the colimit over nn of the spaces X nX_n. To make this precise, we have to give the indexing category over which to take the colimit. So let NN be the category whose objects are nonzero natural numbers, and with a morphism nmn\to m if and only if nn divides mm. We can construct a functor X :NCMetX_{-}\colon N\to CMet mapping the number nn to the space X nX_n, and the morphism nnkn\to n k (for n,kn,k natural numbers) to the function X nX nkX_n\to X_{n k} which copies kk times the sequence (x 1,,x n)(x_1,\dots,x_n):

(x 1,,x n)(x 1,,x n,x 1,,x n,,x 1,,x n). (x_1,\dots,x_n) \mapsto (x_1,\dots,x_n,x_1,\dots,x_n,\dots,x_1,\dots,x_n) .

The construction is also functorial in XX: given a map f:XYf\colon X\to Y we get a map X nY nX_n\to Y_n by just applying ff elementwise. The fact that simple measures are dense in PXP X now reads as follows.

Theorem. The space of probability measures PXP X can be expressed as the following colimit: PX=colim nNX n. P X = colim_{n\in N} X_n .

A similar statement can also be given using the X nX^n, without quotienting over permutations (see our paper for more details on this).

Monad structure arising from the colimit

We have seen that we can determine the functor PP in terms of the spaces of finite sequences X nX_n. The monad structure of PP can be also defined purely in terms of finite sequences, in a way which is quite close to how one uses integration in practice (at least, in the interpretation given above). Let’s see how. First of all, let mm and nn be natural numbers, and consider the elements of (X m) n(X_m)_n, which are the sequences of sequences, in the form

((x 11,,x m1),,(x 1n,,x mn)). ((x_{1 1},\dots,x_{m 1}),\dots, (x_{1 n},\dots,x_{m n})) .

There is a map E m,n:(X m) nX mnE_{m,n}\colon (X_m)_n \to X_{m n} which we can think of as “removing the inner brackets”, or “flattening the array”:

((x 11,,x m1),,(x 1n,,x mn))(x 11,,x m1,,x 1n,,x mn). ((x_{1 1},\dots,x_{m 1}),\dots, (x_{1 n},\dots,x_{m n})) \; \mapsto \; (x_{1 1},\dots,x_{m 1},\dots, x_{1 n},\dots,x_{m n}) .

This map is already the “core” of the “averaging” map PPXPXP P X\to P X which gives the multiplication of the monad: the empirical distribution of the sequence

(x 11,,x m1,,x 1n,,x mn) (x_{1 1},\dots,x_{m 1},\dots, x_{1 n},\dots,x_{m n})

is exactly the average of the empirical distributions of the sequences

(x 11,,x m1),,(x 1n,,x mn). (x_{1 1},\dots,x_{m 1}),\dots, (x_{1 n},\dots,x_{m n}) .

We can make the statement concrete by appealing to the universal property of PP. It turns out that there is a unique map E:PPXPXE\colon P P X\to P X extending by continuity the maps E m,n:(X m) nX mnE_{m,n}\colon (X_m)_n \to X_{m n}. This map is precisely the integration, and this way it is even defined without mentioning measure theory at all.

The fact that the resulting EE satisfies the monad axioms comes from the fact that the E m,nE_{m,n} satisfy similar axioms, which can interpreted in terms of adding and removing brackets.

For example, for the analogue of the associativity diagram for the E m,nE_{m,n}, suppose we are given m,n,kNm,n,k\in N, and we consider a triple sequence, i.e. an element of ((X k) m) n((X_k)_m)_n in the form

(((x 111,,x k11),,(x 1m1,,x km1)),,((x 11n,,x k1n),,(x 1mn,,x kmn))). (((x_{1 1 1},\dots ,x_{k 1 1}), \dots , (x_{1 m 1},\dots ,x_{k m 1})),\dots,((x_{1 1 n},\dots ,x_{k 1 n}), \dots , (x_{1 m n},\dots ,x_{k m n}))).

We can first remove the innermost brackets to obtain

((x 111,,x k11,,x 1m1,,x km1),,(x 11n,,x k1n,,x 1mn,,x kmn)) ((x_{1 1 1},\dots ,x_{k 1 1}, \dots , x_{1 m 1},\dots ,x_{k m 1}),\dots,(x_{1 1 n},\dots ,x_{k 1 n}, \dots , x_{1 m n},\dots ,x_{k m n}))

and then remove the remaining inner brackets to obtain

(x 111,,x k11,,x 1m1,,x km1,,x 11n,,x k1n,,x 1mn,,x kmn). (x_{1 1 1},\dots ,x_{k 1 1}, \dots , x_{1 m 1},\dots ,x_{k m 1},\dots,x_{1 1 n},\dots ,x_{k 1 n}, \dots , x_{1 m n},\dots ,x_{k m n}).

Or we can first remove the middle-level brackets to obtain

((x 111,,x k11),,(x 1m1,,x km1),,(x 11n,,x k1n),,(x 1mn,,x kmn)) ((x_{1 1 1},\dots ,x_{k 1 1}), \dots , (x_{1 m 1},\dots ,x_{k m 1}),\dots,(x_{1 1 n},\dots ,x_{k 1 n}), \dots , (x_{1 m n},\dots ,x_{k m n}))

and then remove the remaining inner brackets to obtain

(x 111,,x k11,,x 1m1,,x km1,,x 11n,,x k1n,,x 1mn,,x kmn). (x_{1 1 1},\dots ,x_{k 1 1}, \dots , x_{1 m 1},\dots ,x_{k m 1},\dots,x_{1 1 n},\dots ,x_{k 1 n}, \dots , x_{1 m n},\dots ,x_{k m n}) .

In the two cases the result is the same. This, after taking the colimit, implies that the associativity diagram for E:PPXPXE:P P X\to P X commutes.

In particular, the functors XX nX\mapsto X_n (and also XX nX\mapsto X^n) are said to form a graded monad. For the theory of graded monads see the paper

  • Fujii, Katsumata and Melliès, Towards a formal theory of graded monads. In: Jacobs B., Löding C. (eds) Foundations of Software Science and Computation Structures. FoSSaCS 2016. Lecture Notes in Computer Science, vol 9634. Springer, Berlin, Heidelberg. (pdf)

In general, the statement is that under some mild conditions, the colimit of a graded monad is a (traditional) monad.

Theorem. The functor PP can be equipped with a unique monad structure (P,E,δ)(P,E,\delta) induced by the maps E m,nE_{m,n} and the universal property of PP.

Algebras

We have said above that algebras of a probability monad tend to look like convex spaces. This is the case for the Kantorovich monad too. In particular, algebras have to be objects of our category, complete metric spaces. An ideal candidate is then closed convex subsets of Banach spaces, since they are complete metric spaces, they are convex, and the convex and metric structures interact nicely with each other. This turns out to be indeed the case:

Theorem. The algebras of the Kantorovich monad are exactly the closed convex subsets of Banach spaces, with the induced metric and convex structure.

The proof of this statement relies on the colimit characterization above, and on previous work of Tobias with Valerio Capraro, which in turn is based on the celebrated Stone’s theorem for convex structures. More information about this can be also found in David Corfield’s post here on the Café about convex spaces. Therefore, the Kantorovich monad gives a framework for treating categorically random variables with values in Banach spaces.

Further questions and reading

Some questions arise from the construction of this monad.

  • Is there a way of obtaining the law of large numbers, or a related result, as a category-theoretical statement?
  • Are there generalizations to Lawvere metric spaces?

A more algebraic question is the following.

  • Which other monads arise as colimits of graded monads, how general is the construction?

If you want to know more, there are the papers of Tobias and myself on the topic (here, here, and here), my PhD thesis (pdf), more work is still in progress. There is also Tobias’ recent talk at MIT. Tobias and I are also working on a project at the Applied Category Theory Adjoint School this year, together with Nathan Bedell, Carmen Constantin, Martin Lundfall, and Brandon Shapiro. The project partly involves the Kantorovich monad, and there will be posts about it on this blog in the coming weeks. So, stay tuned!

Posted at March 24, 2019 10:36 AM UTC

TrackBack URL for this Entry:   https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/3100

9 Comments & 0 Trackbacks

Re: The Kantorovich Monad

I’m writing in a hurry, and haven’t thought/read properly, but is there any relation between this monad and the adjunction between the category of (bounded?) metric spaces and the category of Banach spaces, for which the “free” Banach space on a given metric space is sometimes called the Arens–Eells space? (The dual of the Arens–Eells space of X is the space of Lipschitz functions on X.)

Posted by: Yemon Choi on March 24, 2019 12:25 PM | Permalink | Reply to this

Re: The Kantorovich Monad

Great question! For starters, the Arens–Eells space AE(X)AE(X) for a metric space XX is the completion of the normed space consisting of all finitely supported signed measures on XX with total weight 00, equipped with the 1-Wasserstein norm. The latter means that p\|p\| is the least total distance that the individual point masses of pp need to traverse in order for them to eliminate each other and give the zero measure, where the distances are weighted by the mass. For the precise definition, see Section 2.2 of Weaver’s book “Lipschitz Algebras”.

Now the AE(X)AE(X) is not quite the free Banach space generated by XX; in fact the latter does not exist, since the norm of an individual point would have to be arbitrarily large. Rather, I think that the Arens–Eells construction gives the free Banach space AE(X)AE(X) generated by a pointed metric space (X,x 0)(X,x_0); so technically we should write AE(X,x 0)AE(X,x_0), and define it to be the completion of the space of finitely supported signed measures on XX modulo δ x 0\mathbb{R}\delta_{x_0}, equipped with the 1-Wasserstein metric as its norm. Then the unit (X,x 0)AE(X,x 0)(X,x_0)\to AE(X,x_0) is simply given by xδ xx\mapsto \delta_x. And it just so happens that for two different choices of basepoint, the two Banach spaces AE(X,x 0)AE(X,x_0) are canonically isomorphic.

Putting this aside, yes, the Arens–Eells space is indeed closely related to the Kantorovich monad! Similar to the construction of AE(X,x 0)AE(X,x_0) sketched above, you get our PXPX by taking the completion of the space of finitely supported probablity measures on XX, equipped with the 1-Wasserstein metric. In other words, while AE(X,x 0)AE(X,x_0) is the free Banach space generated by a pointed metric space, our PXPX is the free closed convex subset of a Banach space generated by a complete metric space XX. (The completeness is not really relevant here, since we need it only for the proof that the elements of PXPX are Radon measures on XX.)

So in particular we get an Arens–Eells monad on the category of pointed metric spaces. I guess that its algebras are exactly the Banach spaces, meaning that the forgetful functor from Banach spaces to pointed metric spaces is monadic. But I haven’t thought about this carefully.

Posted by: Tobias Fritz on March 24, 2019 9:33 PM | Permalink | Reply to this

Re: The Kantorovich Monad

The parenthesis intended to be part of the link to “valuations” accidentally escaped to become part of the following text.

Posted by: LSpice on March 31, 2019 9:33 PM | Permalink | Reply to this

Re: The Kantorovich Monad

Thanks, it’s sorted now. The brackets in the link name were confusing the parser, so the brackets needed converting to html entities.

Posted by: Simon Willerton on April 1, 2019 7:51 PM | Permalink | Reply to this

Re: The Kantorovich Monad

I understand by the paper that the algebras are a generalization of the integral, but don’t you get the Bochner integral with this?

Posted by: Carlos on April 4, 2019 4:07 AM | Permalink | Reply to this

Re: The Kantorovich Monad

The algebra map of a general (region of a) Banach space is exactly the Bochner integral, yes.

Posted by: Paolo Perrone on April 4, 2019 9:49 AM | Permalink | Reply to this

colimit monad

Paolo asks

Which other monads arise as colimits of graded monads, how general is the construction?

Since a monad graded in a monoidal category, MM, is a lax bifunctor from its delooping, BM\mathbf{B} M, to CatCat (or why not to any 2-category?), perhaps there are clever things to say about left Kan extensions of lax bifunctors along BM1\mathbf{B} M \to 1.

Posted by: David Corfield on July 5, 2019 10:19 AM | Permalink | Reply to this

Re: The Kantorovich Monad

Nice post ! Thank you !

I am an actuary and a very big point for us is the LLN.

Did you have the possibility to progress on the open question “Is there a way of obtaining the law of large numbers, or a related result, as a category-theoretical statement?”

Thank you very much in advance

Posted by: Pierre Ars on June 18, 2021 11:16 AM | Permalink | Reply to this

Re: The Kantorovich Monad

There has been some progress on the law of large numbers, but we’re not really there yet. I recently gave a talk on the LLN in categorical probability that is available here. Since then I’ve had some further ideas on how one could potentially prove it rather than putting it in as a definition, but they’re still inconclusive.

What we do already have is a proof of the de Finetti theorem.

Posted by: Tobias Fritz on June 18, 2021 6:10 PM | Permalink | Reply to this

Post a New Comment