### Optimal Transport and Enriched Categories II

#### Posted by Simon Willerton

### The dual transport problem from general linear programming duality

Last time I described the basic, discrete optimal transport problem in which there are some suppliers wanting to supply certain amounts of some resource (eg loaves of bread) and some receivers wanting to receive a certain amount of that resource. There’s a cost associated to transporting a unit of the resource from a supplier to a receiver, this cost is fixed for each pair of supplier and receiver. You want to plan how much resource to send from each supplier to each receiver so as to minimize the total cost.

I then said that finding this minimum cost was equivalent to finding the maximum revenue of a different problem, where someone, for each supplier, fixes a price that they’re willing to pay for a unit of the resource and, for each receiver, they fix a price that they’ll charge for a unit of resource. These prices are subject to a constraint from the original transportation cost. You’ll see again what that is in the post below.

This time I want to show how this second problem is the ‘linear programming dual’ of the first, ‘primal’ problem – this general theory of such linear programming duality is likely to be in every undergraduate course on optimization. I’ll end with some extra comments on general programming duality.

#### The primal linear programming problem

One formulation of a standard linear programming problem is that, for some given sets of constants $\{b_i\}$, $\{c_j\}$ and $\{A_{i j}\}$, you wish to find non-negative quantities $x_1, \dots, x_n$ which minimize the sum

$z\coloneqq c_1 x_1 + c_2 x_2 +\dots + c_n x_n$

subject to the inequality constraints

$\begin{aligned} A_{11}x_1 + \dots + A_{1n}x_n &\ge b_1\\ \vdots\quad +\dots+\quad\vdots\ \ \quad &\ \quad \vdots \\ A_{m1}x_1 + \dots + A_{m n}x_n &\ge b_m.\\ \end{aligned}$

We can phrase this problem more compactly as follows: given an $m$-by-$n$ matrix $A\in \mathrm{M}_{m,n}(\mathbb{R})$ and two vectors $c\in \mathbb{R}^n$, $b\in \mathbb{R}^m$ we wish to find a vector $x\in \mathbb{R}^n_{\ge 0}$ which minimizes the cost

$z\coloneqq c^{\T} x\in \mathbb{R}$

subject to the constraint

$A x\ge b ,$

where this last inequality is interpreted componentwise. Similarly, I’ll sometimes write $x\ge 0$ for $x\in \mathbb{R}^n_{\ge 0}$.

People generally say $x\in \mathbb{R}^n_{\ge 0}$ is a **feasible solution** if it satisfies the constraint. Of course, there might not be any feasible solutions, eg. if $m=n=1$ with the constraint $-x_1\ge 1$.

We can write $z^\ast$ for the minimum cost over all feasible solutions and $x^\ast$ for an optimal solution.

The eagle-eyed will have seen that I shouldn’t really say *minimum* here, I should really say *greatest lower bound*, as either there may be no feasible solutions, or the cost function might be *unbounded* on the feasible set, eg. minimize $-x_1$ subject to $x_1\ge 0$. So an optimal solution $x^\ast$ might not exist for either of those two reasons, and if it does exist then it might not be unique.

#### The dual linear programming problem

Given the data $A$, $c$ and $b$ as above, we can form the **dual problem** which is to find a $y\in \mathbb{R}^m_{\ge 0}$ which *maximizes* the value

$w\coloneqq y^{\T}b$

subject to the constraint

$y^{\T}A\le c^{\T}.$

Again, people say $y\in \mathbb{R}^m_{\ge 0}$ is a **feasible solution** if it satisfies the constraint.

Similar to above, write $w^\ast$ for the maximum value and $y^\ast$ for a maximizer, or optimal solution.

Note that in the dual problem you have a variable $y_i$ for each *constraint* in the primal problem, and a constraint $\sum y_i A_{i j}\le c_j$ for each *variable* $x_j$ in the primal problem.

#### Linear programming duality

It is easy to see that any feasible solution $y\in \mathbb{R}^m_{\ge 0}$ of the dual problem gives a value $w$ which is a lower bound for the cost $z$ of any feasible solution $x\in \mathbb{R}^n_{\ge 0}$ of the primal problem. This is simply because

$w = y^{\T}b\le y^{\T}A x\le c^{\T}x =z.$

This immediately gives **weak linear programming duality** which is that the maximum value $w^\ast$ of the dual problem is a lower bound for the maximum cost $z^\ast$ of the primal problem:

$w^\ast = \sup_y w \le \inf_x z = z^\ast.$

In fact, we have a stronger result.

Strong Linear Programming Duality.Given the data $A$, $c$ and $b$ as above we have that if either the primal or the dual problem have an optimal solution then both have one, and the maximum value of the dual problem isequal tothe minimum cost of the primal problem, ie. $w^\ast=z^\ast$.

Perhaps unsuprisingly, given how I’ve written this, strong duality is a lot harder to prove. One way to prove it involves the use of a hyperplane separation theorem – this is hinting at the fact that the convexity of the set of feasible solutions, makes things nice in this situation. I won’t go into the proof here.

### Deriving the dual transport problem

I’m now in a position to show you how to derive the dual optimal transport problem from last time. I’ll rewrite the optimal transport problem in the standard form of a linear programming problem as above and then read off the dual problem.

#### The primal problem

Recall that given a transport cost matrix $k\in M_{r,s}(\mathbb{R})$ with supply vector $\sigma\in \mathbb{R}^s$ and demand vector $\rho\in \mathbb{R}^r$, we wish to minimize the transport cost

$\sum_{i=1,j=1}^{r,s}k_{i,j} \alpha_{i,j}$

subject to the supply and demand constraints

$\begin{aligned} -\sum_{j=1}^s \alpha_{i,j} &\ge -\sigma_i &\text{for}\ &i=1,\dots,r\\ \sum_{1=1}^r \alpha_{i,j} &\ge \rho_j &\text{for }\ &j=1,\dots,s. \end{aligned}$

Note that I’ve used negation to reverse the supply constraint.

Now we define the following

$x\coloneqq\left(\begin{smallmatrix}\alpha_{1,1}\\ \alpha_{2,1}\\\vdots \\ \alpha_{r,1}\\ \alpha_{1,2} \\\vdots \\ \alpha_{r-1, s}\\ \alpha_{r,s}\end{smallmatrix}\right)\in \mathbb{R}^{rs}; \qquad c\coloneqq \left(\begin{smallmatrix} k_{1,1}\\ k_{2,1}\\\vdots \\ k_{r,1}\\ k_{1,2} \\\vdots \\ k_{r-1, s}\\ k_{r,s} \end{smallmatrix}\right) \in \mathbb{R}^{rs}; \qquad b\coloneqq \left( \begin{smallmatrix} -\sigma_1\\ -\sigma_2\\ -\sigma_3\\ \vdots\\ -\sigma_s\\ \rho_1\\ \rho_2\\ \rho_3\\ \vdots\\ \rho_r \end{smallmatrix} \right) \in \mathbb{R}^{r+s};$

$A\coloneqq\left( \begin{smallmatrix} -1& 0&0&\dots& 0&-1&0 &0&\dots&0&-1&0&0&\dots&\dots &0 &0\\ 0&-1& 0&\dots& 0&0&-1 &0&\dots&0&0&-1&0&\dots&\dots &0& 0\\ 0&0&-1&\dots& 0&0&0&-1 &\dots&0&0&0&-1&\dots&\dots &0 &0\\ \vdots&\vdots&\vdots&&\vdots&\vdots&\vdots&\vdots&&\vdots&\vdots&\vdots&\vdots&& &\vdots&\vdots \\ 0&0&0&\dots& -1&0&0&0 &\dots&-1&0&0&0&\dots&\dots &0 &-1\\ 1&1&1&\dots& 1&0&0&0 &\dots&0&0&0&0&\dots &\dots &0&0\\ 0&0&0&\dots& 0&1&1&1 &\dots&1&0&0&0&\dots &\dots &0&0\\ 0&0&0&\dots &0&0&0&0&\dots& 0&1&1&1 &\dots&\dots &0&0\\ \vdots&\vdots&\vdots&&\vdots&\vdots&\vdots&\vdots&&\vdots&\vdots&\vdots&\vdots&& &\vdots&\vdots \\ 0&0&0&\dots &0&0&0&0&\dots&0&0&0&0&\dots&\dots &1&1\\ \end{smallmatrix} \right) \in \M_{r+s,rs}(\mathbb{R}).$

Then our transport problem is precisely find $x\in \mathbb{R}^{rs}_{\ge 0}$ which minimizes $c^{\mathrm{T}}x$ subject to $A x\ge b$. This is in the standard form so we can now write down the dual problem.

#### The dual problem

The dual problem is then immediately written down: given $A$, $b$ and $c$ as above, maximize $y^{\mathrm{T}}b$ over $y\in \mathbb{R}^{r+s}_{\ge 0}$ subject to $y^{\mathrm{T}}A\le c$. Writing $y^{\mathrm{T}}\coloneqq (v_1, v_2,\dots, v_s,u_1,u_2,\dots,u_r)$, this becomes the following.

Maximize

$\sum_{i=1}^r u_i \rho_i- \sum_{j=1}^s v_j\sigma_j$

subject to

$\begin{aligned} u_i,v_j&\in \mathbb{R}_{\ge 0}\ &\text{for all}\ &i,j,\\ u_i-v_j&\le k_{i j}\ &\text{for all}\ &i,j. \end{aligned}$

This is precisely the dual optimal transport problem as I gave last time.

Next time, I want to explain how you get a duality between the prices $v_1, v_2,\dots, v_s$ at suppliers/bakeries, and the prices $u_1, u_2,\dots, u_r$ at receivers/cafés. This will lead into a connection with the profunctor nucleus that I’ve posted about before.

I will end now with a digression that grew out of a reply to a question of Tom’s from the last post.

### Appendix: general Lagrangian programming duality

This part is not strictly part of the narative of this series of posts, so feel free to skip it. I will mention a more general type of duality for optimization problems here, leave you to show how it generalizes the linear programming duality above and I will give some hints as to how this might tie in to enriched category theory. (The problem I’ll state will have non-negative input variables and inequality constraints, but the analysis below easily adapts to having inputs ranging over all of $\mathbb{R}$ and having equality constraints. The case I give is just the nicest formulation for the linear programming duality example I’m interested in in this post.)

Suppose you want to minimize an arbitrary function $f(x)$ over $x\in \mathbb{R}^n_{\ge 0}$ subject to the arbitrary constraints $g_i(x)\le 0$ for $i=1,\dots m$. Equivalently, you could consider minimizing the following function over the whole of $\mathbb{R}^n_{\ge 0}$, ie. without any constraints:

$F(x) = \begin{cases}f(x)\ &\text{if}\ g_i(x)\le 0\ \text{for}\ i =1,\dots, m\\ \infty& \text{otherwise.} \end{cases}$

Another way of expressing this function is via $F(x) = f(x)+\mathcal{I}_g(x)$ where $\mathcal{I}_g$ is the indicator function on the feasible set – the subset of points satisfying the constraints – meaning $\mathcal{I}_g$ is $0$ on the feasible set and $\infty$ off it.

It shouldn’t take you long to see that you can write the indicator function $\mathcal{I}_g$ in the following form, because of the nature of the constraints:

$\mathcal{I}_g(x)\ =\ \sup_{y_i\ge 0}\sum_{i=1}^m y_i g_i(x)\ =\ \sup_{y\ge 0} (y^\mathrm{T} g(x)) \ =\ \begin{cases}0\ &\text{if}\ g_i(x)\le 0\ \text{for}\ i =1,\dots, m\\ \infty\ & \text{otherwise}, \end{cases}$

where I’ve written $g(x)$ for the vector $(g_1,\dots,g_m)^\mathrm{T}$.

This gives that the function you are trying to minimize over $x\in \mathbb{R}^n_{\ge 0}$ is

$F(x)=\sup_{y\ge 0}\bigl ( f(x) + y^\mathrm{T} g(x)).$

You can define the **Lagrangian function** $\mathcal{L}(x,y)\coloneqq f(x) + y^\mathrm{T} g(x)$. So what you wish to find is

$\inf_{x\ge 0} F(x)= \inf_{x\ge 0}\, \sup_{y\ge 0} \mathcal{L}(x,y).$

And by the max-min inequality you get

$\inf_{x\ge 0}\, \sup_{y\ge 0} \mathcal{L}(x,y) \ge \sup_{y\ge 0} \,\inf_{x\ge 0} \mathcal{L}(x,y).$

If the Lagrangian function is particularly nice in some sense then you will get an equality here and you can say that a minimax theorem holds. For instance, you might expect this to happen when you have the cost function $f$ and the constraints being convex in some sense.

You are perhaps led at this point to define the **Lagrangian dual function** as follows:

$D(y)\ \coloneqq \ \inf_{x\ge 0} \mathcal{L}(x,y)\ =\ \inf_{x\ge 0}(f(x)+ y^\mathrm{T}g(x)).$

Then the **Lagrangian dual problem** to the original constrained problem is to maximize $D(y)$ for $y\in \mathbb{R}^m_{\ge 0}$. As we’ve seen by the max-min inequality,

$\inf_{x\ge 0} F(x)\ge \sup_{y\ge 0} D(y),$

this means that the maximum of the dual problem is a lower bound for the minimum of the original constrained problem: this is **weak duality**. In nice cases, where the minimax theorem holds, the optima of the two problems will be equal: this is **strong duality**.

Before I go on to what this might all have to do with enriched category theory, you should check that this ties into the linear programming duality that this post is supposed to be about! Feel free to put your answer in the comments.

Nice, easy exercise.For the primal linear programming problem “Minimize $c^{\mathrm{T}}x$ for $x\in \mathbb{R}^n_{\ge 0}$ subject to $A x\ge b$.” find the Lagrangian function $\mathcal{L}(x,y)$ and the Lagrangian dual function $D(y)$. Show that the Lagrangian dual problem “Maximize $D(y)$ for $y\in \mathbb{R}^m_{\ge 0}$.” is equivalent to the dual linear programming problem “Maximize $y^{\mathrm{T}}b$ for $y\in \mathbb{R}^m_{\ge 0}$ subject to $y^\mathrm{T} A \le c$.” from earlier in this post.

Now, how should this Lagrangian duality picture relate to enriched category theory? What you will see in the next few posts is that you should be thinking in terms of enriching over the monoidal category $(\bar{\mathbb{R}}, +)$ of extended real numbers, where an object is either a real number or $\pm \infty$, and there is a unique morphism $a\to b$ precisely when $a\ge b$. In this category coproducts are infima and products are suprema. You should think of $\mathbb{R}^n_{\ge 0}$ and $\mathbb{R}^m_{\ge 0}$ as discrete $\bar {\mathbb{R}}$-categories and functions on them as functors (or presheaves if you like).

Viewed in this way, the max-min inequality

$\inf_{x\ge 0} \,\sup_{y\ge 0} \mathcal{L}(x,y) \ge \sup_{y\ge 0} \,\inf_{x\ge 0} \mathcal{L}(x,y)$

is a manifestation of the canonical morphism that exists between expressions with a limit and colimit interchanged: for a functor $\mathcal{L} \colon X \times Y \to R$, there is a canonical morphism

$\mathrm{colim}_X \mathrm{lim}_Y \mathcal{L} \to \mathrm{lim}_Y \mathrm{colim}_X \mathcal{L},$

see the nLab. Similarly the minimax theorem holding is perhaps to be thought of as a statement about certain colimits and limits commuting.

Furthermore, the function $g\colon \mathbb{R}^n\to \mathbb{R}^m$ gives us the pairing $P_g\colon \mathbb{R}^n\times \mathbb{R}^m\to \mathbb{R}$ given by $P_g(x,y)\coloneqq y^\mathrm{T} g(x)$. This can be viewed as a profunctor. In the usual way of things profunctorial, which we’ll see more of in this series of posts, this profunctor gives rise to a functor from functors (presheaves) on $\mathbb{R}^n$ to functors on $\mathbb{R}^m$ given by a coend $f\mapsto \int^x f(x)\otimes P_g(x, {-})$, which translated back into the language here is $f\mapsto \inf_x (f(x) + P_g(x,{-}))$ which is precisely the definition of the Lagrangian dual function.

At the moment, it is somewhat speculative whether there’s anything deeper categorical going on here, and, as I intimated, this is a slight digression from the main thread of this series of posts. So I will stop here for now.

## Re: Optimal Transport and Enriched Categories II

Transportation problems have the special property that if the supply and demand conditions are integers, then there is an optimal assignment which is also integral. (See, for example, Kathleen Trustrum,

Linear Programming(1971) p.35). Is this reflected in the categorical point of view?