The case $c(x,y)=h(x-y)$ for $h$ Strictly Convex and the Existence of an Optimal Transport Map $T$
In this article, we will consider $X=Y=\Omega \subset \mathbb{R}^d$ with $\Omega$ compact and $c(x,y)=h(x-y)$ for $h$ strictly convex.
From now on, we assume the duality result
$$
\min (\mathrm{KP})=\max (\mathrm{DP})
$$
holds true, which means for the optimal transport plan $\gamma$ and a Kantorovich potential $\varphi$, we have
$$
\int_{\Omega \times \Omega} c \ d\gamma
=
\int_{\Omega} \varphi \ d\mu + \int_{\Omega} \varphi^c \ d\nu.
$$
Since we have $\varphi(x)+\varphi^c(y)\le c(x,y)$ for every $x,y\in \Omega$, we have
$$
\varphi(x)+\varphi^c(y)=c(x,y)\quad \gamma-a.s.
$$
Furthermore, since $\varphi^c$ and $c$ are continuous, we have $\varphi(x)+\varphi^c(y)=c(x,y)$ on $\operatorname{supp}(\gamma)$.
Fix $(x_0,y_0)\in \operatorname{supp}(\gamma)$. By the definition of $\varphi^c$,
$$
\varphi^c(y_0)=\inf_{x\in \Omega}[c(x,y_0)-\varphi(x)]\le c(x,y_0)-\varphi(x)
\qquad \forall x\in \Omega.
$$
On the other hand, since $(x_0,y_0)\in \operatorname{supp}(\gamma)$,
$$
\varphi(x_0)+\varphi^c(y_0)=c(x_0,y_0),
$$
which implies
$$
c(x_0,y_0)-\varphi(x_0)=\varphi^c(y_0)\le c(x,y_0)-\varphi(x)
\qquad \forall x\in \Omega.
$$
Therefore
$$
x\longmapsto c(x,y_0)-\varphi(x)
$$
is minimal at $x=x_0$.
If $\varphi$ and $c(\cdot,y_0)$ are differentiable at $x_0$ and $x_0\notin \partial \Omega$, then the first order condition tells us
$$
\nabla_x c(x_0,y_0)-\nabla \varphi(x_0)=0,
$$
namely
$$
\nabla \varphi(x_0)=\nabla_x c(x_0,y_0).
$$
Proposition 1.1 If $c$ is $C^1$, $\varphi$ is a Kantorovich potential for the cost $c$ in the transport from $\mu$ to $\nu$, and $(x_0,y_0)$ belongs to the support of an optimal transport plan $\gamma$. Then
$$
\nabla \varphi(x_0)=\nabla_x c(x_0,y_0),
$$
provided $\varphi$ is differentiable at $x_0$.
In particular, the gradients of two different Kantorovich potentials coincide on every point $x_0\in \operatorname{supp}(\mu)$ where both the potentials are differentiable.
Proof : The proof is contained in the above considerations.
Definition 1.4 (Twist condition) For $\Omega \subset \mathbb{R}^d$, we say that $c:\Omega \times \Omega \to \mathbb{R}$ satisfies the twist condition whenever $c$ is differentiable w.r.t. $x$ at every point and the map
$$
y\longmapsto \nabla_x c(x,y)
$$
is injective for every $x_0\in \Omega$.
Remark By Proposition 1.1, we know for $(x_0,y_0)\in \operatorname{supp}(\gamma)$,
$$
\nabla_x c(x_0,y_0)=\nabla \varphi(x_0).
$$
If $c$ satisfies the twist condition, for given $x_0$, there is a unique $y_0$ such that
$$
(x_0,y_0)\in \operatorname{supp}(\gamma).
$$
This shows that $\gamma$ is concentrated on a graph.
For $c(x,y)=h(x-y)$ with $h$ strictly convex, if $\varphi$ and $h$ are differentiable at $x_0$ and $x_0-y_0$, respectively, and $x_0\notin \partial \Omega$. Then
$$
\nabla_x c(x,y)=\nabla h(x-y).
$$
If $\nabla_x c(x,y_1)=\nabla_x c(x,y_2)$, then
$$
\nabla h(x-y_1)=\nabla h(x-y_2).
$$
Since $h$ is strictly convex,
$$
x-y_1=x-y_2,
$$
which implies $y_1=y_2$ and therefore $h$ satisfies twist condition.
Since
$$
\nabla \varphi(x_0)=\nabla_x c(x_0,y_0)=\nabla h(x_0-y_0),
$$
then
$$
x_0-y_0=(\nabla h)^{-1}(\nabla \varphi(x_0)).
$$
Hence
$$
y_0=x_0-(\nabla h)^{-1}(\nabla \varphi(x_0)).
$$
By Rademacher theorem, $\varphi$ is differentiable $\mathcal{L}^d$-a.e. If $\mu\ll \mathcal{L}^d$, then $\nabla \varphi$ exists $\mu$-a.s.
Theorem 1.5 Given $\mu$ and $\nu$ probability measures on a compact domain $\Omega\subset \mathbb{R}^d$, there exists an optimal transport plan $\gamma$ for the cost $c(x,y)=h(x-y)$ with $h$ strictly convex. It is unique and of the form $(id \times T)_\sharp \mu$ provided $\mu\ll \mathcal{L}^d$ and $\partial \Omega$ is negligible. Moreover, there exists a Kantorovich potential $\varphi$ and $T$ and the potentials $\varphi$ are linked by
$$
T(x)=x-(\nabla h)^{-1}(\nabla \varphi(x)).
$$
Proof : The proof is contained in the previous considerations and the uniqueness is by Proposition 1.1, which implies $\nabla \varphi$ is unique, then the optimal $T$. $\square$
Remark. Every time we know that any optimal $\gamma$ must be induced by a map $T$, then we have uniqueness.
Proof : If we have two different plans
$$
\gamma_1=\gamma_{T_1}, \qquad \gamma_2=\gamma_{T_2}
$$
are optimal. Then
$$
\bar{\gamma}:=\frac12\gamma_1+\frac12\gamma_2
$$
is also optimal, but it cannot be induced by a map unless $T_1=T_2$, $\mu$-a.e., which gives a contradiction.
Indeed, let
$$
A:=\{x\in X:\ T_1(x)\ne T_2(x)\}.
$$
Suppose $\mu(A)>0$ and there exists a map $T:X\to Y$ such that
$$
\bar{\gamma}=\gamma_T=(id\times T)_\sharp\mu.
$$
For $B\in \mathcal{B}(X)$, consider
$$
\operatorname{Graph}(T|_B):=\{(x,T(x)):\ x\in B\}.
$$
We have
$$
\begin{aligned}
\bar{\gamma}(\operatorname{Graph}(T|_B)) &= (id\times T)_\sharp\mu\big(\{(x,T(x)):\ x\in B\}\big)\\\
&=\mu\big((id\times T)^{-1}\{(x,T(x)):\ x\in B\}\big)\\\
&=\mu(B).
\end{aligned}
$$
Define
$$
B_1=\{x\in A:\ T(x)=T_1(x)\}, \qquad B_2=\{x\in A:\ T(x)=T_2(x)\}.
$$
By the definition of $A$, we have $B_1\cap B_2=\varnothing$.
First, we claim $\mu\big(A\setminus (B_1\cup B_2)\big)=0.$
If not, $\mu(C)>0$, where $C=A\setminus (B_1\cup B_2)$. Consider
$$
E:=\operatorname{Graph}(T|_C)=\{(x,T(x)):\ x\in C\}.
$$
We have
$$
\bar{\gamma}(E)=\mu(C)>0.
$$
However,
$$
\gamma_1(E)
=
\mu\big(\{x\in C:\ (x,T_1(x))\in E\}\big)
=
\mu\big(\{x\in C:\ T_1(x)=T(x)\}\big)=0.
$$
Similarly $\gamma_2(E)=0$. Therefore,
$$
\bar{\gamma}(E)=\frac12\gamma_1(E)+\frac12\gamma_2(E)=0,
$$
which gives a contradiction. Therefore
$$
\mu\big(A\setminus (B_1\cup B_2)\big)=0.
$$
Next, since
$$
A=B_1\cup B_2\cup A\setminus (B_1\cup B_2)
$$
and $B_1\cap B_2=\varnothing$,
$$
\mu(A)=\mu(B_1)+\mu(B_2)>0.
$$
Therefore, either $\mu(B_1)>0$ or $\mu(B_2)>0$. Without loss of generality, we assume $\mu(B_1)>0$.
Finally, consider
$$
F:=\operatorname{Graph}(T_2|_{B_1})=\{(x,T_2(x)):\ x\in B_1\}.
$$
We have
$$
\gamma_2(F)=\mu\big(\{x\in B_1:\ (x,T_2(x))\in F\}\big)=\mu(B_1),
$$
therefore
$$
\bar{\gamma}(F)\ge \frac12\gamma_2(F)=\frac12\mu(B_1)>0.
$$
However,
$$
\bar{\gamma}(F)
=
\mu\big(\{x\in X:\ (x,T(x))\in F\}\big)
=
\mu\big(\{x\in B_1:\ T(x)=T_2(x)\}\big)
=
\mu(\varnothing)=0,
$$
which gives a contradiction. $\square$
The quadratic case in $\mathbb{R}^d$: $c(x,y)=\frac12|x-y|^2$
Proposition 1.2 Given a function $\chi:\mathbb{R}^d\to \mathbb{R}\cup\{+\infty\}$, let us define
$$
u_\chi:\mathbb{R}^d\to \mathbb{R}\cup\{+\infty\}
$$
$$
u_\chi(x)=\frac12|x|^2-\chi^c(x).
$$
Then we have
$$
u_{\chi^c}=(u_\chi)^\star
$$
where $f^\star$ denotes the Legendre-Fenchel transform of $f$.
In particular, a function $\varphi$ is $c$-concave if and only if
$$
x\longmapsto \frac12|x|^2-\varphi(x)
$$
is convex and lower semi-continuous.
Proof :
$$
\begin{aligned}
u_{\chi^c}(x)& =\frac12|x|^2-\chi^c(x)\\\
&=\sup_y\left[\frac12|x|^2-\frac12|x-y|^2+\chi(y)\right]\\\
&=
\sup_y\left[x\cdot y-\left(\frac12|y|^2-\chi(y)\right)\right]\\\
&=(u_\chi)^\star.
\end{aligned}
$$
If $\varphi$ is $c$-concave, which means there exists a function $\chi$ s.t.
$$
\varphi=\chi^c,
$$
then
$$
u_\varphi(x)=\frac12|x|^2-\varphi(x)=(u_\chi)^\star
$$
which is convex and lower semi-continuous.
Conversely, if $u_\varphi=\frac12|x|^2-\varphi(x)$ is convex and lower-semi-continuous, then there exists a function $\chi$ such that
$$
u_\varphi=\chi^\star.
$$
Then
$$
\begin{aligned}
\varphi(x)&=\frac12|x|^2-\chi^\star=
\frac12|x|^2-\sup_y[x\cdot y-\chi(y)]\\\
&=
\inf_y\left[\frac12|x-y|^2-\left(\frac12|y|^2-\chi(y)\right)\right]=
\left(\frac12|y|^2-\chi(y)\right)^c,
\end{aligned}
$$
which shows that $\varphi$ is $c$-concave. $\square$
By Theorem 1.5, there exists a optimal transport map
$$
T(x)=x-\nabla \varphi(x)=\nabla\left(\frac12|x|^2-\varphi(x)\right)=\nabla u_\varphi(x).
$$
By Proposition 1.2, $u_\varphi$ is convex.
Reference
Santambrogio, Filippo. Optimal transport for applied mathematicians. Birkäuser, NY 55.58-63 (2015): 94.
The cover image of this article was taken upon arriving in Kagoshima, Japan, aboard a Royal Caribbean cruise.
The case $c(x,y)=h(x-y)$ for $h$ Strictly Convex and the Existence of an Optimal Transport Map $T$
