Skip to the Main Content

Note:These pages make extensive use of the latest XHTML and CSS Standards. They ought to look great in any standards-compliant modern browser. Unfortunately, they will probably look horrible in older browsers, like Netscape 4.x and IE 4.x. Moreover, many posts use MathML, which is, currently only supported in Mozilla. My best suggestion (and you will thank me when surfing an ever-increasing number of sites on the web which have been crafted to use the new standards) is to upgrade to the latest version of your browser. If that's not possible, consider moving to the Standards-compliant and open-source Mozilla browser.

September 12, 2023

Coalgebraic Behavioural Metrics: Part 1

Posted by Emily Riehl

guest post by Keri D’Angelo, Johanna Maria Kirss and Matina Najafi and Wojtek Rozowski

Long ago, coalgebras of all kinds lived together: deterministic and nondeterministic automata, transition systems of the labelled and probabilistic varieties, finite and infinite streams, and any other arrows α:XFX\alpha: X\to F X for an arbitrary endofunctor F:𝒞𝒞F:\mathcal{C}\to\mathcal{C}.

drawing1

These different systems were governed by a unifying theory of FF-coalgebra which allowed them to understand each other in their differences and become stronger by a sense of sameness all at once.

drawing1

However, the land of coalgebra was ruled by a rigid principle. Its name was Behavioural Equivalence, and many felt it was too harsh in its judgement.

drawing1

Some more delicate transition systems like the probabilistic kinds, found being governed by behavioural equivalence to be unjust towards their nuanced and minute differences, and slowly, diversion began to brew among the peoples.

drawing1

More and more did the coalgebra find within themselves small distinctions that did not quite outweigh their similarities, and more and more did they feel called to be able to express that in their measurements. The need for a notion of behavioural distance gained momentum, but various paths towards a behavioural distance emerged.

drawing1

Transition systems as coalgebras

State transition systems are a pervasive object in theoretical computer science. Consider a deterministic finite automaton. The usual textbook definition (see, for example, “Automata and Computability” by Dexter Kozen) of a DFA is a 5-tuple (Q,Σ,δ:Q×ΣQ,FQ,q 0)(Q, \Sigma, \delta : Q \times \Sigma \to Q, F \subseteq Q, q_0 ) consisting of finite set of states QQ, an alphabet Σ\Sigma, a transition function δ:Q×ΣQ\delta : Q \times \Sigma \to Q, that when given a state qQq \in Q and a letter aΣa \in \Sigma, δ(q,a)Q\delta(q,a) \in Q yields a state that can be obtained by reading a letter aa from the alphabet in some state qq. The subset FQF \subseteq Q describes a set of states which are accepting and q 0q_0 denotes the initial state. The set of words over an alphabet Σ\Sigma, which we denote by Σ *\Sigma^{\ast} is the least set satisfying the following ϵΣ *(Empty word ϵ is a word) \epsilon \in \Sigma^{\ast} \quad \text{(Empty word } \epsilon \text{ is a word)} If aΣand wΣ *thenawΣ *(Concatenating a letter to a word yields a word) \text{If }\; a \in \Sigma \;\text{and }\; w \in \Sigma^{\ast} \;\text{then}\; aw \in \Sigma^{\ast} \text{(Concatenating a letter to a word yields a word)}

We can now extend the transition function to operate on words. Define δ^:Q×Σ *Q\hat{\delta}: Q \times \Sigma^{\ast} \to Q by induction, using the construction of a word: δ^(q,ϵ)=q \hat{\delta}(q,\epsilon) = q \quad\quad δ^(q,aw)=δ^(δ(q,a),w) \hat{\delta}(q, aw) = \hat\delta(\delta(q,a),w)

A word ww is accepted by the automaton if δ^(q 0,w)F\hat{\delta}(q_0, w) \in F. The set of all accepted words {wΣ *δ^(q 0,w)F}2 Σ *\{w \in \Sigma^\ast \mid \hat\delta(q_0,w) \in F\} \in 2^{\Sigma^\ast} is called the language of an automaton. The languages that can be accepted by a deterministic automaton with finitely many states are called regular or rational languages.

Now, let’s slightly relax and rearrange the definitions.

First, let’s drop the idea having an initial state. Instead of the language of the automaton, we will now speak of a language qQq \in Q of a state of an automaton. Every automaton induces a function :Q2 Σ *\dagger : Q \to 2^{\Sigma^{\ast}} which is given by q{wΣ δ^(q,w)F}q \mapsto \{ w \in \Sigma^\star \mid \hat\delta(q,w) \in F\}, and the set (q)\dagger(q) is called the language of the state qq.

We can view the set of accepting states, as a function o:Q2o : Q \to 2 to the two-element set, such that o(q)=1qFo(q)=1 \Leftrightarrow q \in F. Currying the transition function yields δ:QQ Σ\delta : Q \to Q^\Sigma. Now, we can use cartesian product and combine the functions oo and δ\delta into a function o,δ:Q2×Q Σ\langle o, \delta\rangle : Q \to 2 \times Q^\Sigma. Intuitively, such function takes a state to its one-step observable behaviour. Finally, we will drop the requirement of the set of states to be finite. Automata with infinitely many states can denote more expressive langauges from the Chomsky hierarchy, such as Context-Free Languages.

Each deterministic automaton can be now viewed as a pair (Q,α:Q2×Q Σ)(Q, \alpha : Q \to 2 \times Q^\Sigma). More interestingly, we can endow the set of all languages 2 Σ *2^{\Sigma^{\ast}} with the structure of a deterministic automaton. Define the acceptance function ϵ?:2 Σ *2\epsilon ? : 2^{\Sigma^{\ast}} \to 2 to be given by ϵ?(L)=1ϵL\epsilon ? (L) = 1 \Leftrightarrow \epsilon \in L. In other words, a language will be an accepting state if and only if it contains the empty word. Now, define a transition function () a:2 Σ *×Σ2 Σ *(-)_a : 2^{\Sigma^{\ast}} \times \Sigma \to {2^{\Sigma^{\ast}}} to be given by (L) a={wawL}(L)_a=\{w \mid aw \in L\}. Intuitively given a language LL and a letter aa, we transition to a language obtained by cutting that letter from words that start with aa in the original language. The transition structure ϵ?,() a\langle \epsilon ? , (-)_a \rangle is often called the semantic Brzozowski derivative, and the automaton 2 Σ *2×(2 Σ *) Σ2^{\Sigma^{\ast}} \to 2\times (2^{\Sigma^{\ast}})^\Sigma is called the (semantic) Brzozowski automaton.

It can be observed that the Brzozowski automaton provides a form of a universal automaton in which every other deterministic automaton embeds in a well-behaved way. By well-behaved, we mean the following: - if a state qQq\in Q in some automaton accepts, then its language (q)\dagger(q) contains an empty word, and so the language is an accepting state in the Brzozowski automaton, - if a state transitions to an another state on some letter aΣa\in \Sigma, then in the Brzozowski automaton, the language of the first state transitions to the language of the second state on the same letter aa.

Formally speaking, let o,δ:Q2×Q Σ\langle o , \delta \rangle : Q \to 2 \times Q^\Sigma be a deterministic automaton with QQ being the set of states. The function o,δ:Q2 Σ *\dagger_{\langle o,\delta\rangle} : Q \to 2^{\Sigma^{\ast}} taking each state to its language satisfies the following for all qQq \in Q: - o(q)=ϵ?( o,δ(q))o(q)=\epsilon?(\dagger_{\langle o,\delta\rangle}(q)), - for all aΣa \in \Sigma, ( o,δ(q)) a= o,δ(δ(a,q))\left(\dagger_{\langle o,\delta\rangle}(q)\right)_a=\dagger_{\langle o,\delta\rangle}\left(\delta(a,q)\right).

In other words, the map o,δ\dagger_{\langle o, \delta\rangle} to the Brzozowski automaton is structure-preserving. More generally, functions between any two deterministic automata satisfying the condition above are called homomorphisms. One can easily show that the Brzozowski automaton satisfies a certain universal property, namely that every other deterministic automaton admits a unique homomorphism to it (which is precisely the map which assigns to a state its language). We can use the Brzozowski automaton and language-assigining map to talk about the behaviour of states – we will say that two states are behaviourally equivalent if they are mapped to the same language in the Brzozowski automaton.

To sum up, the set of languages equipped with an automaton structure provides a universal and fully abstract treatment of the behaviours of states of deterministic automata.

Now, consider a different object of study within theoretical computer science and logic: Kripke frames. The usual way one gives the semantics of modal logic is through Kripke frames. A Kripke frame is a pair (W,RW×W)(W, R \subseteq W \times W) consisting of a set of worlds WW and an accessibility relation RW×WR \subseteq W \times W. A valuation v:W𝒫(AP)v : W \to \mathcal{P}({AP}) is a function, which assigns to each world a set of atomic propositions which are true in it. A Kripke frame along with a valuation forms a Kripke model. Given two Kripke models ((W 1,R 1W 1×W 1),v 1)((W_1, R_1 \subseteq W_1 \times W_1), v_1) and ((W 2,R 2W 2×W 2),v 2)((W_2, R_2 \subseteq W_2 \times W_2), v_2), we call a map f:W 1W 2f : W_1 \to W_2 a pp-morphism if the following conditions hold: - if uR 1vu R_1 v, then f(u)R 2f(v)f(u) R_2 f(v), - if sR 2ts R_2 t for some s=f(u)s = f(u), then uR 1wu R_1 w and t=f(w)t=f(w) for some wW 1w \in W_1, - pv(w)p \in v(w) if and only if pv(f(w))p \in v(f(w)). domae pp-morphisms are well-behaved morphisms preserving the structure of Kripke models (and hence the validity of modal formulae), and are thus somewhat similar in spirit to the homomorphisms between deterministic automata.

Observe that an accessibility relation RW×WR \subseteq W \times W on a set of worlds can be viewed equivalently as a function W𝒫(W)W \to \mathcal{P}(W). Similarly to deterministic automata, we can use the Cartesian product to combine seuthe accessibility relation and the valuation in a frame into a function R,v:W𝒫(W)×𝒫(AP)\langle R,v\rangle:W \to \mathcal{P}(W) \times \mathcal{P}({AP}) which takes a world to its one-step observable behaviour. A Kripke model then becomes a pair (W,α:W𝒫(W)×𝒫(AP))(W, \alpha : W \to \mathcal{P}(W) \times \mathcal{P}(AP)).

Both deterministic automata and Kripke frames are pairs consisting of a set of states, and a one-step transition function which takes a state into some set-theoretic construction describing its bservable behaviour. These set-theoretic constructions are endofunctors SetSet\mathsf{Set} \to \mathsf{Set}, in our case 2×() Σ2 \times (-)^\Sigma and 𝒫()×𝒫(AP)\mathcal{P}(-) \times \mathcal{P}(AP). Deterministic automata and Kripke frames are concrete instances of coalgebras for an endofunctor.

Let’s spell out the concrete definitions.

Let F:𝒞𝒞F : \mathcal{C} \to \mathcal{C} be an endofunctor on some category. An FF-coalgebra is a pair (X,α:XFX)(X, \alpha : X \to F X) consisting of an object XX of category 𝒞\mathcal{C} and an arrow α:XFX\alpha : X \to F X.

A homomorphism from FF-coalgebra (X,β)(X, \beta) to (Y,γ)(Y, \gamma) is a map f:XYf: X \to Y satisfying γf=Ffβ\gamma \circ f = Ff \circ \beta. Homomorphisms of deterministic automata and pp-morphisms are the concrete instatiations of this definition.

Coalgebras and their homomorphisms form a category. Under an approriate size restrictions on the functor FF, there exists a final coalgebra (νF,t)(\nu F, t), which is a final object in the category of coalgebras. In other words, it satisfies the universal property that for every FF-coalgebra (X,β)(X,\beta) there exists a unique homomorphism β:XνF\dagger \beta : X \to \nu F. Final coalgebras provide a abstract domain providing denotation of the behaviour of FF-coalgebras. We already saw that in the case of deterministic automata, Brzozowski automaton provided a final coalgebra, where formal langauges denoted the behaviour of states of automata. Finally, the concrete notion of behavioural equivalence was given by language equivalence.

Behavioural equivalence of quantitative systems

Having introduced the motivation for the study of coalgebras, we move on to a particular example, which will motivate looking at behavioural distance. Consider an endofunctor 𝒟:SetSet\mathcal{D} : \mathsf{Set} \to \mathsf{Set}, which takes each set XX to the set 𝒟X={μ:X[0,1] xXμ(X)=1}\mathcal{D} X = \{\mu : X \to [0,1] \mid \sum_{x \in X} \mu(X) = 1\} of discrete probability distributions over XX. On arrows f:XYf : X \to Y, we have that 𝒟f:𝒟X𝒟Y\mathcal{D} f : \mathcal{D}X \to \mathcal{D}Y is given by 𝒟f(μ)(y)= f(x)=yμ(x)\mathcal{D} f (\mu)(y) = \sum_{f(x) = y} \mu(x), also known as the pushforward measure. Coalgebras for 𝒟\mathcal{D} are precisely Discrete Time Markov chains. Since all the states in such a case have no extra observable behaviour besides the transition probability, it turns out that all the states are behaviourally equivalent. Let’s consider a slightly more involved example, where each state can also be terminating with some probability.

Consider a composite functor 𝒟(1+())\mathcal{D}(1 + (-)) that takes a set XX to 𝒟({*}X):=𝒟(1+X)\mathcal{D}(\{\ast\}\cup X):=\mathcal{D}(1+X). Coalgebras for it can be thought of Markov Chains with potential deadlock behaviour. Now, let’s look at a concrete example of a four state automaton

drawing1

State zz enters deadlock with probability 11, while state uu has a self loop with probability 11. The functor 𝒟(1+())\mathcal{D}(1 + (-)) admits a final coalgebra and it is not too hard to observe that states uu and zz exhibit completely different behaviour and will be mapped into distinct elements of the carrier of final coalgebra. State yy has 12\frac{1}{2} probability of ending in uu and 12\frac{1}{2} probability of ending in zz. It has two equiprobable options of tranisitioning to states, which behave in a different way from eachother. It is not too hard that yy is also not behaviourally equivalent to uu and zz. Finally, consider the state zz. Let ϵ[0,12]\epsilon \in [0, \frac{1}{2}]. State xx can transition to uu with probability 12ϵ\frac{1}{2} - \epsilon and transition to zz with probability 12+ϵ\frac{1}{2} + \epsilon. Observe that only when ϵ=0\epsilon=0 state xx is behaviourally equivalent to yy. Even for very small non-zero values of ϵ\epsilon, states xx and yy would be considered inequivalent.

For practical purposes, coalgebraic behavioural equivalence (for coalgebras on Set\mathsf{Set}) might be way too restrictive. In the case of systems exposing quantitative effects, it would be more desirable to consider a more robust notion of behavioural distance, which would quantify how similarly two states behave. The abstract idea of distance has been captured by mathematical notion of metric spaces and related objects.

Pseudometrics and PMet\mathsf{PMet}

Underlying the behavioural distance between states of a coalgebra is the mathematical idea of distance. Most commonly, distance is treated in the form of a metric space. A metric space is a set XX with a distance function d:X×X[0,)d:X\times X\to [0,\infty) that satisfies the following axioms for each x,yXx,y\in X. - Separation: d(x,y)=0x=yd(x,y)=0\iff x=y. - Symmetry: d(x,y)=d(y,x)d(x,y)=d(y,x), - Triangle inequality: d(x,z)d(x,y)+d(y,z)d(x,z)\leq d(x,y)+d(y,z).

In the context of coalgebras, however, we may wish to have a slightly altered notion of distance to capture behavioural similarity.

If two states behave identically, then even if they are not the same state, we expect their distance to be zero. Thus, we relax the separation axiom and only require that d(x,x)=0d(x,x)=0 for all xXx\in X. This yields a pseudometric. Note that a pseudometric is still symmetric and still fulfills the triangle inequality. It is also possible to have an asymmetric notion of distance, where both separation and symmetry are relaxed. The distance function obtained like that is a function d:X×X[0,)d:X\times X\to [0,\infty) that only fulfills the triangle inequality, but it is possible that d(x,y)=0d(x,y)=0 and d(x,y)d(y,x)d(x,y)\neq d(y,x) for some xyx\neq y. This is called a hemimetric, and it occurs when formalising distance via fuzzy lax extensions.

Finally, we may wish to bound the distance, to have a way of expressing when two states are “maximally different”. For example, we use this to express that an accepting and a non-accepting state are maximally different. In order to accommodate that, we instead define the distance function as d:X×X[0,]d:X\times X\to [0,\top], where (0,]\top \in (0,\infty]. In the context of behavioural distances on coalgebra, it is common to take =1\top = 1 or =\top = \infty. However, in the case of =\top = \infty, this requires a way of calculating with infinity, and this is resolved by defining distance and addition with infinity as d(x,)=,x;d(,)=0;x+=,x[0,].d(x,\infty) = \infty,\: x\neq \infty; \quad d(\infty,\infty)=0;\quad x+ \infty = \infty,\: x\in [0,\infty].

Once we fix the bound \top and some set XX, we can consider the set PMet(X)\mathsf{PMet}(X) of all pseudometrics dd over XX. It turns out that under the pointwise order and appropriate definitions of meet and join, PMet(X)\mathsf{PMet}(X) becomes a lattice. In particular, the order is given by d 1d 2d 1(x,y)d 2(x,y)x,yX, d_1 \leq d_2 \iff d_1(x,y)\leq d_2(x,y) \quad\forall x,y\in X, and for a subset DPMet(X)D\subseteq \mathsf{PMet}(X), (supD)(x,y):=sup{d(x,y)dD},infD:=sup{ddPMet(X)landdD.dd}.(\sup D)(x,y) := \sup \{d(x,y)\mid d\in D\},\quad\inf D := \sup \{d\mid d \in \mathsf{PMet}(X) \land \forall d'\in D. d\leq d'\}.

Moreover, it is a complete lattice, meaning each subset has a meet and a join. The lattice being complete implies that the category of all pseudometric spaces for a fixed \top is complete and cocomplete, which in turn implies that the category has products and coproducts.

Here’s how we define this category of pseudometric spaces:

The category PMet\mathsf{PMet} for a fixed (0,]\top\in (0,\infty] has - pseudometric spaces (X,d X)(X,d_X) as objects, - nonexpansive maps f:(X,d X)(Y,d Y)f:(X,d_X)\to (Y,d_Y) as morphisms.

The definition of nonexpansive maps is quite immediate: d(f(x),f(y))d(x,y)x,yX,d(f(x),f(y))\leq d(x,y)\quad \forall x,y\in X, i.e. the maps do not increase distances between elements. Clearly, the composition of nonexpansive maps is nonexpansive as well, and the identity map is nonexpansive.

Motivation from transportation theory

In the long run, we would like to be able to go from distances on the states of coalgebra to distances on observable behaviours. For now, let’s focus our attention on the discrete distributions functor. Let XX be a set equipped with a \top-pseudometric structure on it, given by d X:X×X[0,]d_X : X \times X \to [0, \top] and let’s say we have two distributions ν,μ𝒟X\nu, \mu \in \mathcal{D}X. We would like define a pseudometric structure d 𝒟X:𝒟X×𝒟X[0,]d_{\mathcal{D} X} : \mathcal{D}X \times \mathcal{D}X \to [0,\top] which would allow us to calculate the distance between ν\nu and μ\mu.

It turns out, there is well-known answer to this question, coming from the field of transportation theory. Let’s make the example slightly more concrete. Let X={A,B,C}X = \{A,B,C\} and from now on we will omit the subscript on the map d X:X×X[0,]d_X :X \times X \to [0,\top]

Imagine that A,BA, B and CC are three bakeries with an adjacent shop where one can taste the pastries. We have that d(A,A)=d(B,B)=d(C,C)=0d(A,A)=d(B,B)=d(C,C)=0. Moreover, let’s say that distance between AA and BB is three (d(A,B)=d(B,A)=3d(A,B)=d(B,A)=3), while BB and CC are in distance four (d(B,C)=d(C,B)=4d(B,C)=d(C,B)=4) and AA and CC are in distance five (d(A,C)=d(C,A)=5d(A,C)=d(C,A)=5). Hence, dd is a metric space.

Let ν𝒟X\nu \in \mathcal{D} X be the distribution of supply of the pastries in each of the bakeries. ν(A)=0.7\nu(A) = 0.7, ν(B)=0.1\nu(B)=0.1 and ν(C)=0.2\nu(C)=0.2

Let μ𝒟X\mu \in \mathcal{D} X be the distribution of demand in pastry shops adjactent by the bakeries. We have that μ(A)=0.2\mu(A) = 0.2, μ(B)=0.3\mu(B)=0.3 and μ(C)=0.5\mu(C)=0.5.

drawing1

In bakery AA, there are more pastries produced than being consumed, while in B,CB, C more people are having their pastry than B,CB, C can actually produce. The reasonable thing to do for an owner of these places, would be to redistribute the pastries so the needs of customers are satisfied. One way of solving this problem would be to come up with a transport plan, a map t:X×X[0,1]t : X \times X \to [0,1], where t(x,y)=kt(x,y) = k would intuitively mean move the ratio of kk pastries from bakery xx to yy. Such a transport plan should satisfy

  • For all xXx \in X, ν(x)= xXt(x,x)\nu(x) = \sum_{x' \in X} t(x,x') - whole supply of pastries is used
  • For all xXx \in X, μ(x)= xXt(x,x)\mu(x) = \sum_{x ' \in X'} t(x',x) - all demand is satisfied

In other words, one can think of t:X×X[0,1]t : X \times X \to [0,1] as a joint probability distribution t𝒟(X×X)t \in \mathcal{D}(X \times X) which can be marginalised into ν\nu and η\eta. Such a joint distribution, which equivalently describes a transportation plan is called a coupling of η\eta and μ\mu. There can be multiple transportation plans, but some might involve unnecessary movement of pastries to the bakery which is too far away. Each transportation plan can be assigned a cost, proportional to the distance that each fraction of pastries needs to travel. The cost of plan tt is given by c tc_t defined as c t= (x,y)X×Xd(x,y)t(x,y)c_t = \sum_{(x,y) \in X \times X} d(x,y)t(x,y)

Let Γ(ν,μ)\Gamma(\nu, \mu) denote the set of all couplings of ν\nu and μ\mu. The owner of the bakeries is aiming to minimise the total cost of transportation. The minimal cost of the tranportation from μ\mu to ν\nu is given by the following d 𝒟(μ,ν)=min{ (x,y)X×Xd(x,y)t(x,y)tΓ(ν,μ)} d^{\mathcal{D}\downarrow}(\mu, \nu) = \min\{\sum_{(x,y) \in X \times X} d(x,y)t(x,y) \mid t \in \Gamma(\nu, \mu)\}

It turns out that phrasing this minimisation problem as a linear program guarantees existence of the optimal coupling, which costs the least. The definition above actually defines takes a pseudometric d:X×X[0,]d : X \times X \to [0, \top] and transforms it into a pseudometric d 𝒟:𝒟X×𝒟X[0,]d^{\mathcal{D}\downarrow} : \mathcal{D} X \times \mathcal{D} X \to [0, \top] on distributions over XX. This (pseudo)metric is known as Wasserstein metric.

In our example, it turns out that the most cost optimal plan is to do the following: - Move 15\frac{1}{5} of pastries from AA (where there is an overproduction) to BB - Move 310\frac{3}{10} of pastries from AA (where there is an overproduction) to CC

The optimal cost is given by the following d 𝒟(ν,μ)=153+3105=2.1d^{\mathcal{D}\downarrow}(\nu,\mu)= \frac{1}{5}\cdot 3 +\frac{3} {10} \cdot 5 = 2.1

Now, consider an alternative approach. Let’s say that instead of organising transport on their own, the owner of the bakery will hire an external company to do that. In such a setting the company will assign to each bakery a price for which it buy pastries (in case of overproduction) or sells (in the case of higher demand). Formally it will be modelled as functions f:X +f : X \to \mathbb{R}^{+} satisfying that for all x,yXx,y \in X, we have that f(x)f(y)d(x,y)\mid f(x) - f(y)\mid \leq d(x,y) One can think of the requirement above that intuitively owner of the bakeries won’t accept situation when they have to pay more for the transport than when performing the transport on their own. Formally speaking, it means that ff is a nonexpansive function from (X,d)(X,d) to nonnegative reals equipped with euclidean norm d ed_e. Given a price plan f:X +f : X \to \mathbb{R}^+ the income of the company will be given by xXf(x)(μ(x)ν(x)) \sum_{x \in X} f(x) (\mu(x) - \nu(x))

The external company would like to maximise their profits, so the optimal profit would be given by d 𝒟(μ,ν)=sup{ xXf(x)(μ(x)ν(x))f:(X,d)( +,d e) nonexpansive} d^{\uparrow \mathcal{D}}(\mu, \nu)= \sup\left\{ \sum_{x \in X}f(x)(\mu(x)-\nu(x)) \mid f : (X, d) \to (\mathcal{R}^{+}, d_e) \text{ nonexpansive}\right\}

The optimal pricing plan exists (again by formulation of the problem as the linear programming problem), so the above is well-defined. d 𝒟d^{\mathcal{D}\uparrow} happens to be a pseudometric, as long as dd is. This notation of distance between distributions is known as Kantorovich metric.

For the example above, the optimal price plan is given by the following: - Owner of the bakeries gives out the excess from AA for free (f(A)=0f(A)=0) - Bakery BB buys necessary pastries for three monetary units (f(B)=3f(B)=3) - Bakery CC does the same with the cost of five units (f(C)=5f(C)=5) In such a case, the distance is given by the following d 𝒟(μ,ν)=0(0.20.7)+3(0.30.1)+5(0.50.2)=2.1d^{\mathcal{D}\uparrow}(\mu, \nu) = 0 \cdot (0.2-0.7) + 3 \cdot (0.3 - 0.1) + 5 \cdot (0.5 - 0.2)= 2.1

In this particular case, we ended up with the same distance as before, when talking about transport plans. It is no coindidence and actually an instance of the quite famous result, know as Kantorovich-Rubinstein duality. We have that d 𝒟=d 𝒟 d^{\mathcal{D}\uparrow}=d^{\mathcal{D}\downarrow} One can intuitively think that minimising the transportation cost is dual to the external company trying to maximise their profit.

Liftings to PMet\mathsf{PMet}

The above distances can also be thought of as answers to the following lifting problem. If FF is a functor on Set\mathsf{Set}, then how can one construct a functor F¯\overline{F} on PMet\mathsf{PMet} such that drawing1

commutes? In other words, how can we construct a functor F¯:PMetPMet\overline{F}:\mathsf{PMet}\to \mathsf{PMet} that acts as FF on the underlying sets and nonexpansive functions?

The first priority when defining a lift is that F¯\overline{F} should be a functor. That is, when given a pseudometric space (X,d)(X,d), we need to define a pseudometric on FXFX in a sensible way. Since the definition of a lift implies that F¯f\overline{F}f is just FfFf for all nonexpansive f:(X,d X)(Y,d Y)f:(X,d_X)\to (Y,d_Y) and we need FfFf to be nonexpansive, then the pseudometrics on the objects FXFX and FYFY need to ensure that FfFf is nonexpansive. Then and only then does F¯\overline{F} become a functor.

Put precisely, in order to have a lift, we require that - for any pseudometric space (X,d X)(X,d_X), we have a pseudometric space (FX,d X F)(FX,d_X^F) such that - for any nonexpansive functions f:(X,d X)(Y,d Y)f:(X,d_X)\to (Y,d_Y), the function Ff:(FX,d X F)Ff:(FX,d_X^F) and (FY,d Y F)(FY,d_Y^F) is nonexpansive.

In the above case of the distance between two distributions, an initial distance d Xd_X between elements of XX was given, and the distance on 𝒟X\mathcal{D}X was defined. The functoriality of such a construction was not checked, but it holds (and the proof in the abstract case can be found in the referential paper).

Inspired by the constructions in the case of 𝒟\mathcal{D}, Kantorovich and Wasserstein liftings can be defined for arbitrary functors. This is also where the relevance to coalgebras becomes evident – the notion of a lift enables us to start with a pseudometric on the state space XX, and induce from that a pseudometric on the space of possible behaviours FXFX.

In the mechanics of lifting a pseudometric, it is useful to consider an evaluation function: a function ev F:FX[0,]ev_F:FX\to [0,\top] for all objects XX. Then, we can consider the function F˜f:FX[0,]\widetilde{F}f:FX\to [0,\top] defined as F˜f=ev FFf\widetilde{F}f = ev_F\circ Ff for morphisms f:X[0,]f:X\to[0,\top]. Certain choices of evaluation functions may arise in the case of particular functors. For example, in the case of 𝒟\mathcal{D}, the evaluation function 𝒟X[0,1]\mathcal{D}X\to [0,1] is taken to be the expected value 𝔼()\mathbb{E}(-).

Kantorovich lifting

The Kantorovich lifting is defined as follows. Let FF be a functor on Set\mathsf{Set} with an evaluation function ev Fev_F. Then the Kantorovich lifting is the functor F¯\overline{F} that takes (X,d)(X,d) to the space (FX,d F)(FX,d^{\uparrow F}), where d F(t 1,t 2):=sup{d e(F˜f(t 1),F˜f(t 2))f:(X,d)([0,],d e)}.d^{\uparrow F}(t_1,t_2) := \sup\{\: d_e\left( \widetilde{F}f(t_1),\widetilde{F}f(t_2)\right)\:\mid\: f:(X,d)\to ([0,\top],d_e) \:\}.

This is in fact the smallest possible pseudometric that makes all functions F˜f:FX[0,]\widetilde{F}f:FX\to [0,\top] nonexpansive. The Kantorovich lifting preserves isometries.

Wasserstein lifting

The Wasserstein distance d Fd^{\downarrow F} that is required to define the Wasserstein lifting, needs more work to set up. Altogether, it requires a generalised definition of couplings, some well-behavedness conditions from the evaluation function, and the preservation of weak pullbacks from the functor.

Posted at September 12, 2023 10:14 PM UTC

TrackBack URL for this Entry:   https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/3501

1 Comment & 0 Trackbacks

Re: Coalgebraic behavioural metrics - Part 1

The functoriality of such a construction was not checked, but it holds (and the proof in the abstract case can be found in the referential paper).

I didn’t see any references or links in this post. Did they get eaten during the posting process?

Posted by: RodMcGuire on September 13, 2023 8:42 PM | Permalink | Reply to this

Post a New Comment