Skip to the Main Content

Note:These pages make extensive use of the latest XHTML and CSS Standards. They ought to look great in any standards-compliant modern browser. Unfortunately, they will probably look horrible in older browsers, like Netscape 4.x and IE 4.x. Moreover, many posts use MathML, which is, currently only supported in Mozilla. My best suggestion (and you will thank me when surfing an ever-increasing number of sites on the web which have been crafted to use the new standards) is to upgrade to the latest version of your browser. If that's not possible, consider moving to the Standards-compliant and open-source Mozilla browser.

June 19, 2021

Optimal Transport and Enriched Categories II

Posted by Simon Willerton

The dual transport problem from general linear programming duality

Last time I described the basic, discrete optimal transport problem in which there are some suppliers wanting to supply certain amounts of some resource (eg loaves of bread) and some receivers wanting to receive a certain amount of that resource. There’s a cost associated to transporting a unit of the resource from a supplier to a receiver, this cost is fixed for each pair of supplier and receiver. You want to plan how much resource to send from each supplier to each receiver so as to minimize the total cost.

I then said that finding this minimum cost was equivalent to finding the maximum revenue of a different problem, where someone, for each supplier, fixes a price that they’re willing to pay for a unit of the resource and, for each receiver, they fix a price that they’ll charge for a unit of resource. These prices are subject to a constraint from the original transportation cost. You’ll see again what that is in the post below.

This time I want to show how this second problem is the ‘linear programming dual’ of the first, ‘primal’ problem – this general theory of such linear programming duality is likely to be in every undergraduate course on optimization. I’ll end with some extra comments on general programming duality.

The primal linear programming problem

One formulation of a standard linear programming problem is that, for some given sets of constants {b i}\{b_i\}, {c j}\{c_j\} and {A ij}\{A_{i j}\}, you wish to find non-negative quantities x 1,,x nx_1, \dots, x_n which minimize the sum

zc 1x 1+c 2x 2++c nx n z\coloneqq c_1 x_1 + c_2 x_2 +\dots + c_n x_n

subject to the inequality constraints

A 11x 1++A 1nx n b 1 ++ A m1x 1++A mnx n b m. \begin{aligned} A_{11}x_1 + \dots + A_{1n}x_n &\ge b_1\\ \vdots\quad +\dots+\quad\vdots\ \ \quad &\ \quad \vdots \\ A_{m1}x_1 + \dots + A_{m n}x_n &\ge b_m.\\ \end{aligned}

We can phrase this problem more compactly as follows: given an mm-by-nn matrix AM m,n()A\in \mathrm{M}_{m,n}(\mathbb{R}) and two vectors c nc\in \mathbb{R}^n, b mb\in \mathbb{R}^m we wish to find a vector x 0 nx\in \mathbb{R}^n_{\ge 0} which minimizes the cost

zc Tx z\coloneqq c^{\T} x\in \mathbb{R}

subject to the constraint

Axb, A x\ge b ,

where this last inequality is interpreted componentwise. Similarly, I’ll sometimes write x0x\ge 0 for x 0 nx\in \mathbb{R}^n_{\ge 0}.

People generally say x 0 nx\in \mathbb{R}^n_{\ge 0} is a feasible solution if it satisfies the constraint. Of course, there might not be any feasible solutions, eg. if m=n=1m=n=1 with the constraint x 11-x_1\ge 1.

We can write z *z^\ast for the minimum cost over all feasible solutions and x *x^\ast for an optimal solution.

The eagle-eyed will have seen that I shouldn’t really say minimum here, I should really say greatest lower bound, as either there may be no feasible solutions, or the cost function might be unbounded on the feasible set, eg. minimize x 1-x_1 subject to x 10x_1\ge 0. So an optimal solution x *x^\ast might not exist for either of those two reasons, and if it does exist then it might not be unique.

The dual linear programming problem

Given the data AA, cc and bb as above, we can form the dual problem which is to find a y 0 my\in \mathbb{R}^m_{\ge 0} which maximizes the value

wy Tb w\coloneqq y^{\T}b

subject to the constraint

y TAc T. y^{\T}A\le c^{\T}.

Again, people say y 0 m y\in \mathbb{R}^m_{\ge 0} is a feasible solution if it satisfies the constraint.

Similar to above, write w *w^\ast for the maximum value and y *y^\ast for a maximizer, or optimal solution.

Note that in the dual problem you have a variable y iy_i for each constraint in the primal problem, and a constraint y iA ijc j\sum y_i A_{i j}\le c_j for each variable x jx_j in the primal problem.

Linear programming duality

It is easy to see that any feasible solution y 0 my\in \mathbb{R}^m_{\ge 0} of the dual problem gives a value ww which is a lower bound for the cost zz of any feasible solution x 0 n x\in \mathbb{R}^n_{\ge 0} of the primal problem. This is simply because

w=y Tby TAxc Tx=z. w = y^{\T}b\le y^{\T}A x\le c^{\T}x =z.

This immediately gives weak linear programming duality which is that the maximum value w *w^\ast of the dual problem is a lower bound for the maximum cost z *z^\ast of the primal problem:

w *=sup ywinf xz=z *. w^\ast = \sup_y w \le \inf_x z = z^\ast.

In fact, we have a stronger result.

Strong Linear Programming Duality. Given the data AA, cc and bb as above we have that if either the primal or the dual problem have an optimal solution then both have one, and the maximum value of the dual problem is equal to the minimum cost of the primal problem, ie. w *=z *w^\ast=z^\ast.

Perhaps unsuprisingly, given how I’ve written this, strong duality is a lot harder to prove. One way to prove it involves the use of a hyperplane separation theorem – this is hinting at the fact that the convexity of the set of feasible solutions, makes things nice in this situation. I won’t go into the proof here.

Deriving the dual transport problem

I’m now in a position to show you how to derive the dual optimal transport problem from last time. I’ll rewrite the optimal transport problem in the standard form of a linear programming problem as above and then read off the dual problem.

The primal problem

Recall that given a transport cost matrix kM r,s()k\in M_{r,s}(\mathbb{R}) with supply vector σ s\sigma\in \mathbb{R}^s and demand vector ρ r\rho\in \mathbb{R}^r, we wish to minimize the transport cost

i=1,j=1 r,sk i,jα i,j \sum_{i=1,j=1}^{r,s}k_{i,j} \alpha_{i,j}

subject to the supply and demand constraints

j=1 sα i,j σ i for i=1,,r 1=1 rα i,j ρ j for j=1,,s. \begin{aligned} -\sum_{j=1}^s \alpha_{i,j} &\ge -\sigma_i &\text{for}\ &i=1,\dots,r\\ \sum_{1=1}^r \alpha_{i,j} &\ge \rho_j &\text{for }\ &j=1,\dots,s. \end{aligned}

Note that I’ve used negation to reverse the supply constraint.

Now we define the following

x(α 1,1 α 2,1 α r,1 α 1,2 α r1,s α r,s) rs;c(k 1,1 k 2,1 k r,1 k 1,2 k r1,s k r,s) rs;b(σ 1 σ 2 σ 3 σ s ρ 1 ρ 2 ρ 3 ρ r) r+s; x\coloneqq\left(\begin{smallmatrix}\alpha_{1,1}\\ \alpha_{2,1}\\\vdots \\ \alpha_{r,1}\\ \alpha_{1,2} \\\vdots \\ \alpha_{r-1, s}\\ \alpha_{r,s}\end{smallmatrix}\right)\in \mathbb{R}^{rs}; \qquad c\coloneqq \left(\begin{smallmatrix} k_{1,1}\\ k_{2,1}\\\vdots \\ k_{r,1}\\ k_{1,2} \\\vdots \\ k_{r-1, s}\\ k_{r,s} \end{smallmatrix}\right) \in \mathbb{R}^{rs}; \qquad b\coloneqq \left( \begin{smallmatrix} -\sigma_1\\ -\sigma_2\\ -\sigma_3\\ \vdots\\ -\sigma_s\\ \rho_1\\ \rho_2\\ \rho_3\\ \vdots\\ \rho_r \end{smallmatrix} \right) \in \mathbb{R}^{r+s};

A(1 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 )M r+s,rs(). A\coloneqq\left( \begin{smallmatrix} -1& 0&0&\dots& 0&-1&0 &0&\dots&0&-1&0&0&\dots&\dots &0 &0\\ 0&-1& 0&\dots& 0&0&-1 &0&\dots&0&0&-1&0&\dots&\dots &0& 0\\ 0&0&-1&\dots& 0&0&0&-1 &\dots&0&0&0&-1&\dots&\dots &0 &0\\ \vdots&\vdots&\vdots&&\vdots&\vdots&\vdots&\vdots&&\vdots&\vdots&\vdots&\vdots&& &\vdots&\vdots \\ 0&0&0&\dots& -1&0&0&0 &\dots&-1&0&0&0&\dots&\dots &0 &-1\\ 1&1&1&\dots& 1&0&0&0 &\dots&0&0&0&0&\dots &\dots &0&0\\ 0&0&0&\dots& 0&1&1&1 &\dots&1&0&0&0&\dots &\dots &0&0\\ 0&0&0&\dots &0&0&0&0&\dots& 0&1&1&1 &\dots&\dots &0&0\\ \vdots&\vdots&\vdots&&\vdots&\vdots&\vdots&\vdots&&\vdots&\vdots&\vdots&\vdots&& &\vdots&\vdots \\ 0&0&0&\dots &0&0&0&0&\dots&0&0&0&0&\dots&\dots &1&1\\ \end{smallmatrix} \right) \in \M_{r+s,rs}(\mathbb{R}).

Then our transport problem is precisely find x 0 rsx\in \mathbb{R}^{rs}_{\ge 0} which minimizes c Txc^{\mathrm{T}}x subject to AxbA x\ge b. This is in the standard form so we can now write down the dual problem.

The dual problem

The dual problem is then immediately written down: given AA, bb and cc as above, maximize y Tby^{\mathrm{T}}b over y 0 r+sy\in \mathbb{R}^{r+s}_{\ge 0} subject to y TAcy^{\mathrm{T}}A\le c. Writing y T(v 1,v 2,,v s,u 1,u 2,,u r)y^{\mathrm{T}}\coloneqq (v_1, v_2,\dots, v_s,u_1,u_2,\dots,u_r), this becomes the following.


i=1 ru iρ i j=1 sv jσ j \sum_{i=1}^r u_i \rho_i- \sum_{j=1}^s v_j\sigma_j

subject to

u i,v j 0 for all i,j, u iv j k ij for all i,j. \begin{aligned} u_i,v_j&\in \mathbb{R}_{\ge 0}\ &\text{for all}\ &i,j,\\ u_i-v_j&\le k_{i j}\ &\text{for all}\ &i,j. \end{aligned}

This is precisely the dual optimal transport problem as I gave last time.

Next time, I want to explain how you get a duality between the prices v 1,v 2,,v sv_1, v_2,\dots, v_s at suppliers/bakeries, and the prices u 1,u 2,,u ru_1, u_2,\dots, u_r at receivers/cafés. This will lead into a connection with the profunctor nucleus that I’ve posted about before.

I will end now with a digression that grew out of a reply to a question of Tom’s from the last post.

Appendix: general Lagrangian programming duality

This part is not strictly part of the narative of this series of posts, so feel free to skip it. I will mention a more general type of duality for optimization problems here, leave you to show how it generalizes the linear programming duality above and I will give some hints as to how this might tie in to enriched category theory. (The problem I’ll state will have non-negative input variables and inequality constraints, but the analysis below easily adapts to having inputs ranging over all of \mathbb{R} and having equality constraints. The case I give is just the nicest formulation for the linear programming duality example I’m interested in in this post.)

Suppose you want to minimize an arbitrary function f(x)f(x) over x 0 nx\in \mathbb{R}^n_{\ge 0} subject to the arbitrary constraints g i(x)0g_i(x)\le 0 for i=1,mi=1,\dots m. Equivalently, you could consider minimizing the following function over the whole of 0 n\mathbb{R}^n_{\ge 0}, ie. without any constraints:

F(x)={f(x) ifg i(x)0fori=1,,m otherwise. F(x) = \begin{cases}f(x)\ &\text{if}\ g_i(x)\le 0\ \text{for}\ i =1,\dots, m\\ \infty& \text{otherwise.} \end{cases}

Another way of expressing this function is via F(x)=f(x)+ g(x)F(x) = f(x)+\mathcal{I}_g(x) where g\mathcal{I}_g is the indicator function on the feasible set – the subset of points satisfying the constraints – meaning g\mathcal{I}_g is 00 on the feasible set and \infty off it.

It shouldn’t take you long to see that you can write the indicator function g\mathcal{I}_g in the following form, because of the nature of the constraints:

g(x)=sup y i0 i=1 my ig i(x)=sup y0(y Tg(x))={0 ifg i(x)0fori=1,,m otherwise, \mathcal{I}_g(x)\ =\ \sup_{y_i\ge 0}\sum_{i=1}^m y_i g_i(x)\ =\ \sup_{y\ge 0} (y^\mathrm{T} g(x)) \ =\ \begin{cases}0\ &\text{if}\ g_i(x)\le 0\ \text{for}\ i =1,\dots, m\\ \infty\ & \text{otherwise}, \end{cases}

where I’ve written g(x)g(x) for the vector (g 1,,g m) T(g_1,\dots,g_m)^\mathrm{T}.

This gives that the function you are trying to minimize over x 0 nx\in \mathbb{R}^n_{\ge 0} is

F(x)=sup y0(f(x)+y Tg(x)). F(x)=\sup_{y\ge 0}\bigl ( f(x) + y^\mathrm{T} g(x)).

You can define the Lagrangian function (x,y)f(x)+y Tg(x)\mathcal{L}(x,y)\coloneqq f(x) + y^\mathrm{T} g(x). So what you wish to find is

inf x0F(x)=inf x0sup y0(x,y). \inf_{x\ge 0} F(x)= \inf_{x\ge 0}\, \sup_{y\ge 0} \mathcal{L}(x,y).

And by the max-min inequality you get

inf x0sup y0(x,y)sup y0inf x0(x,y). \inf_{x\ge 0}\, \sup_{y\ge 0} \mathcal{L}(x,y) \ge \sup_{y\ge 0} \,\inf_{x\ge 0} \mathcal{L}(x,y).

If the Lagrangian function is particularly nice in some sense then you will get an equality here and you can say that a minimax theorem holds. For instance, you might expect this to happen when you have the cost function ff and the constraints being convex in some sense.

You are perhaps led at this point to define the Lagrangian dual function as follows:

D(y)inf x0(x,y)=inf x0(f(x)+y Tg(x)). D(y)\ \coloneqq \ \inf_{x\ge 0} \mathcal{L}(x,y)\ =\ \inf_{x\ge 0}(f(x)+ y^\mathrm{T}g(x)).

Then the Lagrangian dual problem to the original constrained problem is to maximize D(y)D(y) for y 0 my\in \mathbb{R}^m_{\ge 0}. As we’ve seen by the max-min inequality,

inf x0F(x)sup y0D(y), \inf_{x\ge 0} F(x)\ge \sup_{y\ge 0} D(y),

this means that the maximum of the dual problem is a lower bound for the minimum of the original constrained problem: this is weak duality. In nice cases, where the minimax theorem holds, the optima of the two problems will be equal: this is strong duality.

Before I go on to what this might all have to do with enriched category theory, you should check that this ties into the linear programming duality that this post is supposed to be about! Feel free to put your answer in the comments.

Nice, easy exercise. For the primal linear programming problem “Minimize c Txc^{\mathrm{T}}x for x 0 nx\in \mathbb{R}^n_{\ge 0} subject to AxbA x\ge b.” find the Lagrangian function (x,y)\mathcal{L}(x,y) and the Lagrangian dual function D(y)D(y). Show that the Lagrangian dual problem “Maximize D(y)D(y) for y 0 my\in \mathbb{R}^m_{\ge 0}.” is equivalent to the dual linear programming problem “Maximize y Tby^{\mathrm{T}}b for y 0 my\in \mathbb{R}^m_{\ge 0} subject to y TAcy^\mathrm{T} A \le c.” from earlier in this post.

Now, how should this Lagrangian duality picture relate to enriched category theory? What you will see in the next few posts is that you should be thinking in terms of enriching over the monoidal category (¯,+)(\bar{\mathbb{R}}, +) of extended real numbers, where an object is either a real number or ±\pm \infty, and there is a unique morphism aba\to b precisely when aba\ge b. In this category coproducts are infima and products are suprema. You should think of 0 n\mathbb{R}^n_{\ge 0} and 0 m\mathbb{R}^m_{\ge 0} as discrete ¯\bar {\mathbb{R}}-categories and functions on them as functors (or presheaves if you like).

Viewed in this way, the max-min inequality

inf x0sup y0(x,y)sup y0inf x0(x,y)\inf_{x\ge 0} \,\sup_{y\ge 0} \mathcal{L}(x,y) \ge \sup_{y\ge 0} \,\inf_{x\ge 0} \mathcal{L}(x,y)

is a manifestation of the canonical morphism that exists between expressions with a limit and colimit interchanged: for a functor :X×YR\mathcal{L} \colon X \times Y \to R, there is a canonical morphism

colim Xlim Ylim Ycolim X, \mathrm{colim}_X \mathrm{lim}_Y \mathcal{L} \to \mathrm{lim}_Y \mathrm{colim}_X \mathcal{L},

see the nLab. Similarly the minimax theorem holding is perhaps to be thought of as a statement about certain colimits and limits commuting.

Furthermore, the function g: n mg\colon \mathbb{R}^n\to \mathbb{R}^m gives us the pairing P g: n× mP_g\colon \mathbb{R}^n\times \mathbb{R}^m\to \mathbb{R} given by P g(x,y)y Tg(x)P_g(x,y)\coloneqq y^\mathrm{T} g(x). This can be viewed as a profunctor. In the usual way of things profunctorial, which we’ll see more of in this series of posts, this profunctor gives rise to a functor from functors (presheaves) on n\mathbb{R}^n to functors on m\mathbb{R}^m given by a coend f xf(x)P g(x,)f\mapsto \int^x f(x)\otimes P_g(x, {-}), which translated back into the language here is finf x(f(x)+P g(x,))f\mapsto \inf_x (f(x) + P_g(x,{-})) which is precisely the definition of the Lagrangian dual function.

At the moment, it is somewhat speculative whether there’s anything deeper categorical going on here, and, as I intimated, this is a slight digression from the main thread of this series of posts. So I will stop here for now.

Posted at June 19, 2021 9:49 PM UTC

TrackBack URL for this Entry:

3 Comments & 0 Trackbacks

Re: Optimal Transport and Enriched Categories II

Transportation problems have the special property that if the supply and demand conditions are integers, then there is an optimal assignment which is also integral. (See, for example, Kathleen Trustrum, Linear Programming (1971) p.35). Is this reflected in the categorical point of view?

Posted by: Richard Pinch on June 20, 2021 10:17 AM | Permalink | Reply to this

Re: Optimal Transport and Enriched Categories II

I would hope that that were the case, but at the moment have not thought of this sort of thing at all. I would hope that switching the basis of enrichment from the extended reals to the extended integers or extended natural numbers would do something interesting in terms of discrete optimization. The only thing I know in that area is Soichiro Fujii’s Bachelor Thesis: A Categorical Approach to L-Convexity.

Posted by: Simon Willerton on June 22, 2021 10:14 AM | Permalink | Reply to this

Re: Optimal Transport and Enriched Categories II

Thanks for that reference. I have occasionally wondered about the relation of combinatorial duality theorems such as Dilworth to their linear programming analogues. Murota’s book seems highly relevant.

Posted by: Richard Pinch on June 22, 2021 9:37 PM | Permalink | Reply to this

Post a New Comment