Sufficient Conditions for Optimality and Stability
In this article, we will show some sufficient conditions for optimality and stability.
Theorem 1.12 Let $\Omega \subset \mathbb{R}^d$ be compact and $c$ be $C^1$ cost function satisfying the twist condition on $\Omega \times \Omega$. Suppose that $\mu \in \mathcal{P}(\Omega)$ and $\varphi \in c\text{-cone}(\Omega)$ are given, that $\varphi$ is differentiable $\mu$-a.e. and that $\mu(\partial \Omega)=0$. Suppose that the map $T$ satisfies
$$
\nabla_x c(x,T(x))=\nabla \varphi(x).
$$
Then $T$ is optimal for the transport cost $c$ between the measures $\mu$ and
$$
\nu:=T_{\sharp}\mu.
$$
Proof : Since $\varphi \in c\text{-cone}(\Omega)$, $\varphi^{cc}=\varphi$, take $\psi=\varphi^c$ we have $\varphi=\psi^c$.
Fix $x_0\in \Omega$ such that $\nabla \varphi(x_0)$ exists and $x_0\notin \partial \Omega$. By compactness and continuity $\exists\ y_0\in \Omega$ such that
$$
\varphi(x_0)=\inf_y [c(x_0,y)-\psi(y)]=c(x_0,y_0)-\psi(y_0).
$$
This gives,
$$
\varphi(x)\leq c(x,y_0)-\psi(y_0)
$$
and
$$
\varphi(x_0)=c(x_0,y_0)-\psi(y_0).
$$
and hence
$$
x\mapsto c(x,y_0)-\varphi(x)
$$
is minimal at $x=x_0$. As a consequence, we get
$$
\nabla \varphi(x_0)=\nabla_x c(x_0,y_0).
$$
By assumption, $y\mapsto \nabla_x c(x,y)$ is injective, and $\nabla_x c(x_0,T(x_0))=\nabla \varphi(x_0)$. We have $y_0=T(x_0)$. This proves
$$
\varphi(x_0)+\psi(T(x_0))=c(x_0,T(x_0)),
$$
and this same equality is true for $\mu$-a.e. $x_0$.
Integrate with respect to $\mu$, we have
$$
\int_{\Omega} c(x,T(x))\ d\mu(x)
=
\int_{\Omega} \varphi(x)+\psi(T(x))\ d\mu(x)
=
\int_{\Omega} \varphi\ d\mu+\int_{\Omega} \psi\ d\nu.
$$
Hence
$$
\max(DP)\geq \int_{\Omega}\varphi\ d\mu+\int_{\Omega}\psi\ d\nu
=
\int_{\Omega} c(x,T(x))\ d\mu(x)
\geq \min(KP)\geq \max(DP).
$$
We have $T$ is optimal. $\square$
Theorem 1.13 Suppose $\mu\in \mathcal{P}(\mathbb{R}^d)$ is such that
$$
\int_{\mathbb{R}^d} |x|^2\ d\mu(x)<+\infty,
$$
and that $u:\mathbb{R}^d\to \mathbb{R}\cup\{+\infty\}$ is convex and differentiable $\mu$-a.e.
Set
$$
T=\nabla u
$$
and suppose
$$
\int_{\mathbb{R}^d} |T(x)|^2\ d\mu(x)<+\infty.
$$
Then $T$ is optimal for the transport cost
$$
c(x,y):=\frac{1}{2}|x-y|^2
$$
between the measures $\mu$ and
$$
\nu:=T_{\sharp}\mu.
$$
Proof : Note that for a convex function $u$
$$
u(x)+u^{\star}(y)\geq x\cdot y
$$
for all $x,y\in \mathbb{R}^d$ and
$$
u(x)+u^{\star}(y)=x\cdot y
$$
if $y=\nabla u(x)$.
Now consider $\gamma\in \Pi(\mu,\nu)$
$$
\int_{\mathbb{R}^d\times \mathbb{R}^d} x\cdot y\ d\gamma(x,y)
\leq
\int_{\mathbb{R}^d\times \mathbb{R}^d} (u(x)+u^{\star}(y))\ d\gamma(x,y)
=
\int_{\mathbb{R}^d} u(x)\ d\mu(x)+\int_{\mathbb{R}^d} u^{\star}(T(x))\ d\mu(x)=
\int_{\mathbb{R}^d} x\cdot T(x)\ d\mu(x)
=
\int_{\mathbb{R}^d\times \mathbb{R}^d} x\cdot y\ d\gamma_T.
$$
Therefore,
$$
\int_{\mathbb{R}^d\times \mathbb{R}^d} \frac{1}{2}|x-y|^2\ d\gamma(x,y)
=
\int_{\mathbb{R}^d\times \mathbb{R}^d} \frac{1}{2}(|x|^2+|y|^2)\ d\gamma(x,y)
-
\int_{\mathbb{R}^d\times \mathbb{R}^d} x\cdot y\ d\gamma(x,y)=
\int_{\mathbb{R}^d\times \mathbb{R}^d} \frac{1}{2}(|x|^2+|y|^2)\ d\gamma_T
-
\int_{\mathbb{R}^d\times \mathbb{R}^d} x\cdot y\ d\gamma_T=
\int_{\mathbb{R}^d\times \mathbb{R}^d} \frac{1}{2}|x-y|^2\ d\gamma_T,
$$
which shows $T$ is optimal. $\square$
Theorem 1.14 Suppose that $\gamma\in \mathcal{P}(X\times Y)$ is given, that $X$ and $Y$ are Polish spaces, that $c:X\times Y\to \mathbb{R}$ is uniformly continuous and bounded, and that $\operatorname{supp}(\gamma)$ is $c$-CM.
Then $\gamma$ is an optimal transport plan between its marginals
$$
\mu=(\pi_x) _{\sharp}\gamma
\quad \text{and} \quad
\nu=(\pi_y) _{\sharp}\gamma
$$
for the cost $c$.
Proof : By Theorem 1.6, there exists a $c$-concave function $\varphi$ such that
$$
\operatorname{supp}(\gamma)\subset \{(x,y):\varphi(x)+\varphi^c(y)=c(x,y)\}
$$
and both $\varphi$ and $\varphi^c$ are continuous and bounded.
By Theorem 1.8, we know the duality result, hence
$$
\min(KP)\leq \int_{X\times Y} c(x,y)\ d\gamma
=
\int_{X\times Y} \varphi(x)+\varphi^c(y)\ d\gamma
=
\int_X \varphi\ d\mu+\int_Y \varphi^c\ d\nu\leq \max(DP)\leq \min(KP),
$$
which shows that $\gamma$ is optimal. $\square$
Definition 1.9 (Hausdorff distance) For a compact metric space $X$, we define the Hausdorff distance on pair of compact subsets of $X$ by
$$
d_H(A,B):=
\max\left\{
\max\{d(x,A):x\in B\},
\max\{d(x,B):x\in A\}
\right\}.
$$
Remark : We have the following equivalent definition.
$d_H(A,B)=\max\{|d(x,A)-d(x,B)|:x\in X\}$.
$d_H(A,B)=\inf\{\varepsilon>0:A\subset B_{\varepsilon},\ A_{\varepsilon}\supset B\}$, where $A_{\varepsilon},B_{\varepsilon}$ stands for the $\varepsilon$-neighborhood of $A$ and $B$ respectively.
Theorem 1.15 (Blaschke) $d_H$ is a distance.
Proposition 1.5 If $d_H(A_n,A)\to 0$ and $\mu_n$ is a sequence of positive measures such that
$$
\operatorname{supp}(\mu_n)\subset A_n
$$
with $\mu_n\rightharpoonup \mu$, then
$$
\operatorname{supp}(\mu)\subset A.
$$
Theorem 1.16 Suppose that $X$ and $Y$ are compact metric spaces and that $c:X\times Y\to \mathbb{R}$ is continuous. Suppose that $\gamma_n\in \mathcal{P}(X\times Y)$ is a sequence of transport plan which are optimal for cost $c$ between their own marginals
$$
\mu_n=(\pi_x) _{\sharp}\gamma_n
\quad \text{and} \quad
\nu_n=(\pi_y) _{\sharp}\gamma_n
$$
and suppose $\gamma_n\rightharpoonup \gamma$. Then
$$
\mu_n \to \mu_{\infty}:=(\pi_x) _{\sharp} \gamma,
\quad
\nu_n \to \nu:= (\pi_y) _{\sharp} \gamma,
$$
and $\gamma$ is optimal in the transport between $\mu$ and $\nu$.
Proof : Set $\Gamma_n:=\operatorname{supp}(\gamma_n)$, up to subsequence, we can assume $\Gamma_n\to \Gamma$ in Hausdorff distance. Each $\Gamma_n$ is a $c$-CM set, now we claim $\Gamma$ is also a $c$-CM set.
Fix $(x_1,y_1),\dots,(x_k,y_k)\in \Gamma$, there are points $(x_1^n,y_1^n),\dots,(x_k^n,y_k^n)\in \Gamma_n$ such that
$$
(x_i^n,y_i^n)\to (x_i,y_i),\qquad i=1,2,\dots,k.
$$
The cyclical monotonicity of $\Gamma_n$ gives
$$
\sum_{i=1}^k c(x_i^n,y_i^n)\leq \sum_{i=1}^k c(x_i^n,y_{\sigma(i)}^n).
$$
Take $n\to\infty$
$$
\sum_{i=1}^k c(x_i,y_i)\leq \sum_{i=1}^k c(x_i,y_{\sigma(i)}).
$$
Hence $\Gamma$ is a $c$-CM set. Since $\gamma_n\rightharpoonup \gamma$ and $\Gamma_n\to \Gamma$ in Hausdorff distance, we have
$$
\operatorname{supp}(\gamma)\subset \Gamma,
$$
which is also a $c$-CM set and implies the optimality of $\gamma$. $\square$
Notation : For a given cost $c:X\times Y\to \mathbb{R}$ and $\mu\in \mathcal{P}(X)$, $\nu\in \mathcal{P}(Y)$, define
$$
J_c(\mu,\nu):=
\min\left\{
\int_{X\times Y} c(x,y)\ d\gamma;\ \gamma\in \Pi(\mu,\nu)
\right\}.
$$
Theorem 1.17 Suppose that $X$ and $Y$ are compact metric spaces and that $c:X\times Y\to \mathbb{R}$ is continuous. Suppose that $\mu_n\in \mathcal{P}(X)$, $\nu_n\in \mathcal{P}(Y)$ with
$$
\mu_n\to \mu
\qquad \text{and} \qquad
\nu_n\to \nu.
$$
Then we have
$$
J_c(\mu_n,\nu_n)\to J_c(\mu,\nu).
$$
Proof : Let $\gamma_n$ be an optimal transport plan from $\mu_n$ to $\nu_n$ for the cost $c$. Up to subsequences, we can assume
$$
\gamma_n\rightharpoonup \gamma.
$$
By Theorem 1.16, $\gamma$ is optimal for $\mu$ to $\nu$. This means
$$
J_c(\mu_n,\nu_n)=\int_{X\times Y} c(x,y)\ d\gamma_n
\to
\int_{X\times Y} c\ d\gamma
=
J_c(\mu,\nu),
$$
which proves the claim.
Theorem 1.18 Suppose that $X$ and $Y$ are compact metric spaces, and that $c:X\times Y\to \mathbb{R}$ is continuous. Suppose that $\mu_n\in \mathcal{P}(X)$ and $\nu_n\in \mathcal{P}(Y)$, with
$$
\mu_n\to \mu
\qquad \text{and} \qquad
\nu_n\to \nu.
$$
Let $(\varphi_n,\psi_n)$ be a pair of $c$-concave Kantorovich potentials for the cost $c$ in the transport from $\mu_n$ to $\nu_n$. Then up to a subsequences, we have
$$
\varphi_n\to \varphi
\qquad \text{and} \qquad
\psi_n\to \psi
$$
where convergence is uniform and $(\varphi,\psi)$ is a pair of Kantorovich potentials for $\mu$ and $\nu$.
Proof : It is easy to see that $\{\varphi_n\}$ and $\{\psi_n\}$ are equibounded and equicontinuity.
By Arzelà–Ascoli theorem, we have
$$
\varphi_n\to \varphi,\qquad \psi_n\to \psi
$$
up to a subsequence, and the convergence is uniform.
Since
$$
\varphi_n(x)+\psi_n(y)\leq c(x,y),
$$
we have
$$
\varphi(x)+\psi(y)\leq c(x,y).
$$
Moreover
$$
J_c(\mu_n,\nu_n)
=
\int_X \varphi_n\ d\mu_n+\int_Y \psi_n\ d\nu_n
\to
\int_X \varphi\ d\mu+\int_Y \psi\ d\nu.
$$
By Theorem 1.17, we also have
$$
J_c(\mu_n,\nu_n)\to J_c(\mu,\nu).
$$
Hence
$$
J_c(\mu,\nu)=\int_X \varphi\ d\mu+\int_Y \psi\ d\nu,
$$
which implies they are Kantorovich potentials. $\square$
Reference
Santambrogio, Filippo. Optimal transport for applied mathematicians. Birkäuser, NY 55.58-63 (2015): 94.
The cover image of this article was taken in Sydney, Australia.
Sufficient Conditions for Optimality and Stability
