Gradient Flows in Wasserstein Space
In this article, we will introduce the gradient flows in Wasserstein space.
For Wasserstein space $W_p := (\mathcal P_p(X), W_p)$, the standard considerations from fluid mechanics tell us that the density $\mu_t$ of a family of particles may be interpreted as the continuity
$$
\partial_t \mu_t + \nabla \cdot (\mu_t v_t) = 0
$$
with $L^p$ vector $v_t$. Moreover, we want to connect the $L^p$ norm of $v_t$ with the metric derivative $|\mu’|(t)$.
Theorem 1. Let $(\mu_t)_{t\in[0,1]}$ be an absolutely continuous curve in $W_p(\Omega)$ (for $p>1$, and $\Omega\subset \mathbb R^d$ an open domain). Then for a.e. $t\in[0,1]$, there exists a vector field $v_t\in L^p(\mu_t;\mathbb R^d)$ such that
- the continuity equation $\partial_t \mu_t + \nabla\cdot(\mu_t v_t)=0$ is satisfied in the sense of distributions,
- for a.e. $t$, we have
$$
\Vert v_t\Vert_{L^p(\mu_t)} \le |\mu’|(t),
$$
where $|\mu’|(t)$ denotes the metric derivative at the time $t$ of the curve $t\mapsto \mu_t$ w.r.t. the distance $W_p$.
Conversely, if $(\mu _t)$ is a family of measures in $\mathcal P_p (\Omega)$ and for each $t$, we have a vector field $v_t\in L^p(\mu_t;\mathbb R^d)$ with $\int_0^1 \Vert v_t\Vert _{L^p(\mu_t)}\ dt<+\infty$ solving
$$
\partial_t \mu_t + \nabla\cdot(\mu_t v_t)=0,
$$
then $(\mu _t)$ is absolutely continuous in $W _p(\Omega)$ and for a.e. $t$, we have
$$
|\mu’|(t)\le \Vert v_t\Vert_{L^p(\mu_t)}.
$$
Note that as a consequence of the second part of the statement, the vector field $v_t$ introduced in the first part must satisfy
$$
\Vert v_t\Vert_{L^p(\mu_t)} = |\mu’|(t).
$$
McCann’s displacement interpolation
Theorem 2. If $\Omega\subset \mathbb R^d$ is convex, then all the spaces $W_p(\Omega)$ are length spaces and if $\mu$ and $\nu$ belong to $W_p(\Omega)$ and $\gamma$ is the optimal transport plan from $\mu$ to $\nu$ for the cost $c_p(x,y)=|x-y|^p$, then the curve
$$
\mu^\gamma(t):=(\pi_t)_\sharp \gamma
$$
where $\pi_t:\Omega\times\Omega\to\Omega$ is given by
$$
\pi_t(x,y)=(1-t)x+ty
$$
is a constant-speed geodesic from $\mu$ to $\nu$. In the case $p>1$, all constant-speed geodesics are of this form, and, if $\mu$ is absolutely continuous, then there is only one geodesic and it has the form
$$
\mu_t=(T_t)_\sharp \mu,\qquad T_t=(1-t)\operatorname{id}+tT,
$$
where $T$ is the optimal transport map from $\mu$ to $\nu$. In this case, the velocity field $V_t$ of geodesic $\mu_t$ is given by
$$
v_t=(T-\operatorname{id})\circ (T_t)^{-1}.
$$
In particular, for $t=0$, we have $v_0=-\nabla\varphi$ and for $t=1$, we have $v_1=-\nabla\psi$, where $\varphi$ is the Kantorovich potential in the transport from $\mu$ to $\nu$ and $\psi=\varphi^c$.
Using the characterization of constant speed geodesics as minimizers of a strictly convex kinetic energy, we have
- Looking for an optimal transport for the cost $c(x,y)=|x-y|^p$ is equivalent to looking for constant-speed geodesics in $W_p$.
- Constant-speed geodesics may be found by minimizing $\int_0^1 |\mu’|(t)^p\ dt$.
- In the case of $W_p$, we have $|\mu’|(t)^p=\int_\Omega |v_t|^p\ d\mu_t$, where $v$ is a velocity field solving the continuity equation together with $\mu$.
As a consequence of these considerations, for $p>1$, solving the kinetic energy minimization problem
$$
\min \left\{ \int_0^1 \int_\Omega |v_t|^p\ d\rho_t\ dt \ ; \ \partial_t \rho_t+\nabla\cdot(\rho_t v_t)=0,\ \rho_0=\mu,\ \rho_1=\nu \right\}
$$
selects constant-speed geodesics connecting $\mu$ to $\nu$ and hence allows to find the optimal transport between $\mu$ and $\nu$. This is what is usually called Benamou–Brenier formula.
Minimizing Movement Scheme in the Wasserstein Space and Evolution PDEs
From now on, we consider $W_2(\Omega)$, consider the MMS,
$$
\rho_{k+1}^\tau \in \operatorname{argmin}_\rho \left\{ F(\rho)+\frac{W_2^2(\rho,\rho_k^\tau)}{2\tau} \right\}. \tag{1}
$$
and denote
$$
\mathcal T_c(\rho,\nu):=\min\left\{ \int c(x,y)\ d\gamma,\ \gamma\in\Pi(\rho,\nu) \right\},
$$
for $\nu=\rho_k^\tau$, $c(x,y)=|x-y|^2$.
Given a functional $G:\mathcal P(\Omega)\to\mathbb R$, we call $\dfrac{\delta G}{\delta \rho}(\rho)$, if it exists, the unique (up to additive constants) function such that
$$
\left.\frac{d}{d\varepsilon}G(\rho+\varepsilon\chi)\right|_{\varepsilon=0}
=
\int \frac{\delta G}{\delta \rho}(\rho)\ d\chi,
$$
for every perturbation $\chi$ such that at least for $\varepsilon\in[0,\bar\varepsilon]$, $\rho+\delta\chi\in\mathcal P(\Omega)$. The function $\dfrac{\delta G}{\delta \rho}(\rho)$ is called the first variation of functional $G$ at $\rho$.
Examples: Let $f:\mathbb R\to\mathbb R$ be a convex superlinear function, $V:\Omega\to\mathbb R$, $W:\mathbb R^d\to\mathbb R$ be regular enough, and $W$ is taken symmetric i.e.
$$
W(z)=W(-z).
$$
We have three functionals
$$
\mathcal F(\rho)=
\begin{cases}
\displaystyle \int f(\rho(x))\ dx & \text{if }\rho\ll \operatorname{leb},\
+\infty & \text{otherwise},
\end{cases}
$$
$$
\mathcal V(\rho)=\int V\ d\rho,\qquad
\mathcal W(\rho)=\frac12\iint W(x-y)\ d\rho(x)\ d\rho(y).
$$
Then we have
$$
\frac{\delta\mathcal F}{\delta \rho}(\rho)=f’(\rho),\qquad
\frac{\delta\mathcal V}{\delta \rho}(\rho)=V,\qquad
\frac{\delta\mathcal W}{\delta \rho}(\rho)=W\star \rho.
$$
Proof : We have
$$
\frac{d}{d\varepsilon} \mathcal F(\rho+\varepsilon\chi)\Big| _{\varepsilon=0}
=
\lim _{\varepsilon\to 0} \frac{\mathcal F(\rho+\varepsilon\chi)-\mathcal F(\rho)}{\varepsilon}
=
\lim _{\varepsilon\to 0}\int \frac{f(\rho(x)+\varepsilon\chi(x))-f(\rho(x))}{\varepsilon}\ dx
=
\int f^\prime(\rho(x))\chi(x)\ dx
=
\int f^\prime(\rho(x))\ d\chi(x).
$$
Hence
$$
\frac{\delta\mathcal F}{\delta \rho}(\rho)=f’(\rho).
$$
Moreover,
$$
\frac{d}{d\varepsilon}\mathcal V(\rho+\varepsilon\chi)\Big| _{\varepsilon=0}
=
\lim _{\varepsilon\to 0}\frac{\mathcal V(\rho+\varepsilon\chi)-\mathcal V(\rho)}{\varepsilon}
=
\lim _{\varepsilon\to 0}\int V\ \frac{d(\rho+\varepsilon\chi)-d\rho}{\varepsilon}
=
\int V\ d\chi.
$$
Hence
$$
\frac{\delta\mathcal V}{\delta \rho}(\rho)=V.
$$
Finally,
$$
\begin{aligned}
\frac{d}{d\varepsilon}\mathcal W(\rho+\varepsilon\chi)\Big| _{\varepsilon=0}
&=
\lim _{\varepsilon\to 0}\frac{\mathcal W(\rho+\varepsilon\chi)-\mathcal W(\rho)}{\varepsilon}
=
\lim _{\varepsilon\to 0}\frac{1}{2\varepsilon}\iint W(x-y)\Big[d((\rho+\varepsilon\chi)(x))\ d((\rho+\varepsilon\chi)(y))-d\rho(x)\ d\rho(y)\Big]\\\
&=
\frac12\iint W(x-y)\ d\chi(x)\ d\rho(y)+\frac12\iint W(x-y)\ d\rho(x)\ d\chi(y)
=
\iint W(x-y)\ d\rho(y)\ d\chi(x).
\end{aligned}
$$
Hence
$$
\frac{\delta\mathcal W}{\delta \rho}(\rho)
=
\int W(x-y)\rho(y)\ dy
=
W\star \rho.
$$
Proposition 1. Let $c:\Omega\times\Omega\to\mathbb R$ be a continuous cost function. Then the functional
$$
\rho\mapsto \mathcal T_c(\rho,\nu)
$$
is convex and its subdifferential at $\rho_0$ coincides with the set of Kantorovich potentials
$$
\left\{ \varphi\in C^0(\Omega):\ \int \varphi\ d\rho_0+\int \varphi^c\ d\nu = \mathcal T_c(\rho,\nu) \right\}.
$$
Moreover, if there is a unique $c$-concave Kantorovich potential $\varphi$ from $\rho_0$ to $\nu$, up to additive constants, then we also have
$$
\frac{\delta \mathcal T_c(\rho,\nu)}{\delta \rho}(\rho_0)=\varphi.
$$
For (1) the objection function is just
$$
F(\rho)+\frac{1}{\tau} \mathcal T_{c/2}(\rho,\rho_k^\tau).
$$
The optimal condition is
$$
\frac{\delta F}{\delta \rho}(\rho_{k+1}^\tau)+\frac{\varphi}{\tau}=\text{const},
$$
where $\varphi$ is the Kantorovich potential for the cost $\dfrac{c}{2}=\dfrac12|x-y|^2$.
Combining the fact that the optimal transport map $T(x)=x-\nabla\varphi(x)$, we get
$$
-v(x) := \frac{T(x)-x}{\tau}=-\frac{\nabla\varphi(x)}{\tau} = \nabla\left( \frac{\delta F}{\delta \rho} (\rho) \right)(x).
$$
This suggest that at the limit $\tau\to 0$, we will find a solution of
$$
\partial_t \rho_t-\nabla\cdot\left(\rho \nabla\left[\frac{\delta F}{\delta \rho}(\rho)\right]\right)=0.
$$
Examples:
- For $\mathcal F(\rho)=\int f(\rho(x))\ dx$ with $f(u)=u\log u$, we have
$$
\frac{\delta\mathcal F}{\delta \rho}(\rho)=f’(\rho)=\log\rho+1,\qquad
\nabla\frac{\delta\mathcal F}{\delta \rho}(\rho)=\frac{\nabla\rho}{\rho}
$$
we have
$$
\partial_t \rho_t=\Delta \rho_t
$$
which is just the Heat equation.
For $F(\rho)=\int f(\rho(x))\ dx+\int V(x)\ d\rho(x)$, we get
$$
\frac{\delta F}{\delta \rho}(\rho)=\log\rho+1+V
\Longrightarrow
\nabla\frac{\delta F}{\delta \rho}=\frac{\nabla\rho}{\rho}+\nabla V.
$$We get the Fokker–Planck equation
$$
\partial_t \rho_t-\Delta \rho-\nabla\cdot(\rho\nabla V)=0.
$$
Reference
Santambrogio, F. {Euclidean, metric, and Wasserstein} gradient flows: an overview. Bull. Math. Sci. 7, 87–154 (2017).
The cover image in this article was taken on North Stradbroke Island, Brisbane, Australia.
Gradient Flows in Wasserstein Space
https://handsteinwang.github.io/2026/04/16/gradient-flows-4/
