Dual Problem of Kantorovich
In this article, we will talk about the dual problem Kantorovich.
Let us express the constraint $\gamma \in \Pi(\mu,\nu)$ in the following way: if $\gamma \in M_+(X\times Y)$ then we have
$$
\sup_{\varphi,\psi}
\left[
\int_X \varphi \ d\mu
+\int_Y \psi \ d\nu
-\int_{X\times Y} (\varphi(x)+\psi(y)) \ d\gamma
\right]
=
\begin{cases}
0, & \text{if } \gamma \in \Pi(\mu,\nu),\\\
+\infty, & \text{otherwise.}
\end{cases}
$$
where $\varphi,\psi$ are bounded continuous.
Proof : If $\gamma \in \Pi(\mu,\nu)$, it is trivial. If $\gamma \notin \Pi(\mu,\nu)$, then there exists a pair $(\varphi_0,\psi_0)$ such that
$$
I(\varphi_0,\psi_0)
:=
\int_X \varphi_0 \ d\mu
+\int_Y \psi_0 \ d\nu
-\int_{X\times Y} (\varphi_0(x)+\psi_0(y)) \ d\gamma
\neq 0.
$$
W.L.O.G. we can assume $I>0$, then if we choose $(a\varphi_0,a\psi_0)$ for $a>0$,
$$
I(a\varphi_0,a\psi_0)=a I(\varphi_0,\psi_0).
$$
Hence
$$
\mathrm{LHS}=\sup_{\varphi,\psi} I(\varphi,\psi)=+\infty.
$$
Hence the Kantorovich problem is equivalent to the following problem
$$
\min_{\gamma \in M_+(X\times Y)}
\left\{
\int_{X\times Y} c \ d\gamma
+\sup_{\varphi,\psi}
\left[
\int_X \varphi \ d\mu
+\int_Y \psi \ d\nu
-\int_{X\times Y} (\varphi(x)+\psi(y)) \ d\gamma
\right]
\right\}.
$$
Consider interchanging $\sup$ and $\inf$
$$
\sup_{\varphi,\psi}
\left\{
\int_X \varphi \ d\mu
+\int_Y \psi \ d\nu
+\inf_{\gamma \in M_+(X\times Y)}
\left[
\int_{X\times Y} \bigl(c(x,y)-(\varphi(x)+\psi(y))\bigr)\ d\gamma
\right]
\right\}.
$$
Now we denote $(\varphi \oplus \psi)(x,y):=\varphi(x)+\psi(y)$, then the infimum in $\gamma$ can be written as
$$
\inf_{\gamma \in M_+(X\times Y)} \int_{X\times Y} (c-\varphi\oplus\psi)\ d\gamma
=
\begin{cases}
0, & \text{if } \varphi\oplus\psi \le c \text{ on } X\times Y,\\\
-\infty, & \text{otherwise.}
\end{cases}
$$
Proof : Let
$$
\widetilde{J}(\gamma):=\int_{X\times Y} (c-\varphi\oplus\psi)\ d\gamma.
$$
If $\varphi\oplus\psi \le c$ on $X\times Y$, then for every $\gamma \in M_+(X\times Y)$,
$$
\widetilde{J}(\gamma)\ge 0.
$$
Since for all $a>0$,
$$
\widetilde{J}(a\gamma)=a\widetilde{J}(\gamma),
$$
we have
$$
\inf_{\gamma \in M_+(X\times Y)} \widetilde{J}(\gamma)=0.
$$
If $\varphi+\psi \le c$ is not true, then for every $\gamma \in M_+(X\times Y)$, $\widetilde{J}(\gamma)<0$. Since for all $a>0$ we have $\widetilde{J}(a\gamma)=a\widetilde{J}(\gamma)$, we have
$$
\inf_{\gamma \in M_+(X\times Y)} \widetilde{J}(\gamma)=-\infty.
$$
Therefore, we get the following dual optimization problem:
Problem 1.3. Given $\mu \in \mathcal{P}(X)$, $\nu \in \mathcal{P}(Y)$ and the cost function $c:X\times Y \to [0,+\infty]$, we consider the problem
$$
(\mathrm{DP})\qquad
\sup
\left\{
\int_X \varphi \ d\mu + \int_Y \psi \ d\nu
\ \middle|\
\varphi \in C_b(X),\ \psi \in C_b(Y),\ \varphi\oplus\psi \le c
\right\}.
$$
Lemma 1.3. $\sup(\mathrm{DP}) \le \min(\mathrm{KP})$.
Proof : For $\gamma \in \Pi(\mu,\nu)$, then for every $\varphi \in C_b(X)$, $\psi \in C_b(Y)$ with $\varphi\oplus\psi \le c$, we have
$$
\int_X \varphi \ d\mu + \int_Y \psi \ d\nu
=
\int_{X\times Y} \varphi\oplus\psi \ d\gamma
\le
\int_{X\times Y} c \ d\gamma.
$$
Definition 1.1. Given a function $\chi:X\to \overline{\mathbb R}$, we define its $c$-transform (also called $c$-conjugate function)
$$
\chi^c:Y\to \overline{\mathbb R}
$$
by
$$
\chi^c(y):=\inf_{x\in X}\ [c(x,y)-\chi(x)].
$$
We also define the $\overline{c}$-transform of $\xi:Y\to \overline{\mathbb R}$ by
$$
\xi^{\overline{c}}:X\to \overline{\mathbb R}
$$
$$
\xi^{\overline{c}}(x):=\inf_{y\in Y}\ [c(x,y)-\xi(y)].
$$
Moreover, we say that a function $\psi$ defined on $Y$ is $\overline{c}$-concave if there exists $\chi:X\to \overline{\mathbb R}$ such that $\psi=\chi^c$, and a function $\varphi$ on $X$ is said to be $c$-concave if there exists $\xi:Y\to \overline{\mathbb R}$ such that $\varphi=\xi^{\overline{c}}$.
We denote by $c\text{-conc}(X)$ and $\overline{c}\text{-conc}(Y)$ the sets of $c$-concave and $\overline{c}$-concave functions, respectively.
(When $X=Y$, and $c$ is symmetric, the distinction between $c$ and $\overline{c}$ will play no more any rule and will be dropped as soon as possible.)
Definition 1.2. A function $f:X\to \mathbb R$ is said to have modulus of continuity $\omega:\mathbb R_+\to \mathbb R_+$ if for all $x,y\in X$
$$
|f(x)-f(y)|\le \omega(d(x,y)).
$$
Lemma 1.4. If $c:X\times Y\to \overline{\mathbb R}$ is continuous and finite on a compact set, then there exists an increasing continuous function $\omega:\mathbb R_+\to \mathbb R_+$ with $\omega(0)=0$ such that
$$
|c(x,y)-c(x’,y’)|\le \omega\bigl(d(x,x’)+d(y,y’)\bigr).
$$
Proof : Let $\omega:\mathbb R_+\to \mathbb R_+$ be defined by
$$
\omega(t):=
\sup\left\{
|c(x,y)-c(x’,y’)|
\ :\
d(x,x’)+d(y,y’)\le t
\right\}.
$$
It is easy to see that $\omega(0)=0$ and $\omega(t)$ is increasing.
Let $r=d(x,x’)+d(y,y’)$, then
$$
|c(x,y)-c(x’,y’)|\le \omega(r)=\omega\bigl(d(x,x’)+d(y,y’)\bigr).
$$
Since $c$ is uniformly continuous on its domain, $\omega$ is continuous.
Lemma 1.5. Let $(f_\alpha)$ be a family of functions, all satisfying the same modulus of continuity $\omega:\mathbb R_+\to \mathbb R_+$,
$$
|f_\alpha(x)-f_\alpha(x’)|\le \omega(d(x,x’)),\qquad \forall \alpha.
$$
Then
$$
f:=\inf_\alpha f_\alpha
$$
also satisfies the same modulus of continuity.
Proof : For each $\alpha$,
$$
f_\alpha(x)\le f_\alpha(x’)+\omega(d(x,x’)),
$$
which implies
$$
f(x)\le f_\alpha(x’)+\omega(d(x,x’))
$$
since $f\le f_\alpha$. Taking the infimum at the RHS, we get
$$
f(x)\le f(x’)+\omega(d(x,x’)).
$$
Interchanging $x$ and $x’$, we obtain
$$
|f(x)-f(x’)|\le \omega(d(x,x’)).
$$
Lemma 1.6. If $c:X\times Y\to \overline{\mathbb R}$ has the modulus of continuity $\omega:\mathbb R_+\to \mathbb R_+$, then $\chi^c$ shares the same modulus of continuity.
Proof : Since
$$
\chi^c(y)=\inf_{x\in X}\bigl(c(x,y)-\chi(x)\bigr)=: \inf_{x\in X} g_x(y),
$$
and for $x\in X$, $y,y’\in Y$,
$$
|g_x(y)-g_x(y’)|=|c(x,y)-c(x,y’)|\le \omega(d(y,y’)),
$$
by Lemma 1.5 we get the desired result.
Lemma 1.7. Let
$$
DP(\varphi,\psi):=\int_X \varphi \ d\mu+\int_Y \psi \ d\nu.
$$
For $\varphi\in C_b(X)$, $\psi\in C_b(Y)$ with $\varphi\oplus\psi\le c$. Then
$$
DP(\varphi,\varphi^c)\ge DP(\varphi,\psi).
$$
Proof : Since $\varphi\oplus\psi\le c$, we have
$$
\psi(y)\le c(x,y)-\varphi(x).
$$
Furthermore
$$
\psi(y)\le \inf_x [c(x,y)-\varphi(x)]=\varphi^c(y).
$$
Hence
$$
DP(\varphi,\psi)\le \int_X \varphi \ d\mu+\int_Y \varphi^c \ d\nu = DP(\varphi,\varphi^c).
$$
Remark. Similarly, we will have
$$
DP(\varphi,\psi)\le DP(\varphi,\varphi^c)\le DP(\varphi^{c\overline{c}},\varphi^c)\le DP(\psi^{\overline{c}},\psi^{\overline{c}c})\le DP(\varphi^{c\overline{c}c},\varphi^{c\overline{c}})\le \cdots
$$
Theorem 1.4. Suppose that $X$ and $Y$ are compact and $c$ is continuous. Then there exists a solution $(\varphi,\psi)$ to problem $(\mathrm{DP})$ and it has the form $\varphi\in c\text{-conc}(X)$, $\psi\in \overline{c}\text{-conc}(X)$ and $\psi=\varphi^c$. In particular,
$$
\max(\mathrm{DP})
=
\max_{\varphi\in c\text{-conc}(X)}
\left[
\int_X \varphi \ d\mu+\int_Y \varphi^c \ d\nu
\right].
\tag{$\star$}
$$
Proof : From the considerations above, we can take a maximizing sequence $(\varphi_n,\psi_n)$ and improve it by means of $c$- and $\overline{c}$-transforms, and we can assume they have the same modulus of continuity as $c$. Instead of renaming the sequence, we will still call $(\varphi_n,\psi_n)$ the new sequence obtained after these transforms.
By Ascoli-Arzelà’s theorem, we only need to check the equiboundedness. For every constant $a\in \mathbb R$,
$$
DP(\varphi,\psi)=DP(\varphi+a,\psi-a)
$$
and
$$
(\varphi+a)\oplus(\psi-a)\le c.
$$
Since $\varphi_n$ is continuous on a compact set and then bounded, W.L.O.G. we can assume
$$
\min \varphi_n=0
$$
and then we get
$$
\max \varphi_n\le \omega(\operatorname{diam} X).
$$
If we have chosen $\psi_n=\varphi_n^c$, we also have
$$
\psi_n(y)=\inf_{x\in X}[c(x,y)-\varphi_n(x)]
\in
\bigl[\min c-\omega(\operatorname{diam} X),\ \max c\bigr].
$$
Hence equibounded. By Ascoli-Arzelà’s theorem, there exists a subsequence
$$
\varphi_{n_k}\to \varphi,\qquad \psi_{n_k}\to \psi\quad \text{(uniform convergence).}
$$
Therefore
$$
\int_X \varphi_{n_k}\ d\mu+\int_Y \psi_{n_k}\ d\nu
\to
\int_X \varphi\ d\mu+\int_X \psi\ d\nu.
$$
and
$$
\varphi_{n_k}(x)+\psi_{n_k}(y)\le c(x,y)
\quad\Rightarrow\quad
\varphi(x)+\psi(y)\le c(x,y).
$$
This shows that $(\varphi,\psi)$ is an admissible pair for $(\mathrm{DP})$ and then optimal.
And if $\psi\neq \varphi^c$, we can improve the objective functional in $(\mathrm{DP})$, which contradicts to $(\varphi,\psi)$ is optimal. Similarly $\varphi=\psi^{\overline{c}}$.
Hence $\psi=\varphi^c\in \overline{c}\text{-conc}(X)$ and similarly $\varphi=\psi^{\overline{c}}\in c\text{-conc}(X)$.
Remark. If $\min(\mathrm{KP})=\max(\mathrm{DP})$, we also have
$$
\min(\mathrm{KP})
=
\max_{\varphi\in c\text{-conc}(X)}
\left[
\int_X \varphi \ d\mu+\int_Y \varphi^c \ d\nu
\right]
$$
which also shows that the minimum value of $(\mathrm{KP})$ is a convex function of $(\mu,\nu)$ as it is a supremum of linear functionals.
Definition 1.3. The functions $\varphi$ realizing the maximum in $(\star)$ are called Kantorovich potentials for the transport from $\mu$ and $\nu$.
Reference
Santambrogio, Filippo. Optimal transport for applied mathematicians. Birkäuser, NY 55.58-63 (2015): 94.
The cover image of this article was taken in Auckland, New Zealand.
Dual Problem of Kantorovich
