Gradient Flows in Wasserstein Space

Gradient Flows in Wasserstein Space

In this article, we will introduce the gradient flows in Wasserstein space.

For Wasserstein space $W_p := (\mathcal P_p(X), W_p)$, the standard considerations from fluid mechanics tell us that the density $\mu_t$ of a family of particles may be interpreted as the continuity
$$
\partial_t \mu_t + \nabla \cdot (\mu_t v_t) = 0
$$
with $L^p$ vector $v_t$. Moreover, we want to connect the $L^p$ norm of $v_t$ with the metric derivative $|\mu’|(t)$.

Theorem 1. Let $(\mu_t)_{t\in[0,1]}$ be an absolutely continuous curve in $W_p(\Omega)$ (for $p>1$, and $\Omega\subset \mathbb R^d$ an open domain). Then for a.e. $t\in[0,1]$, there exists a vector field $v_t\in L^p(\mu_t;\mathbb R^d)$ such that

  • the continuity equation $\partial_t \mu_t + \nabla\cdot(\mu_t v_t)=0$ is satisfied in the sense of distributions,
  • for a.e. $t$, we have
    $$
    \Vert v_t\Vert_{L^p(\mu_t)} \le |\mu’|(t),
    $$
    where $|\mu’|(t)$ denotes the metric derivative at the time $t$ of the curve $t\mapsto \mu_t$ w.r.t. the distance $W_p$.

Conversely, if $(\mu _t)$ is a family of measures in $\mathcal P_p (\Omega)$ and for each $t$, we have a vector field $v_t\in L^p(\mu_t;\mathbb R^d)$ with $\int_0^1 \Vert v_t\Vert _{L^p(\mu_t)}\ dt<+\infty$ solving

$$
\partial_t \mu_t + \nabla\cdot(\mu_t v_t)=0,
$$

then $(\mu _t)$ is absolutely continuous in $W _p(\Omega)$ and for a.e. $t$, we have

$$
|\mu’|(t)\le \Vert v_t\Vert_{L^p(\mu_t)}.
$$

Note that as a consequence of the second part of the statement, the vector field $v_t$ introduced in the first part must satisfy
$$
\Vert v_t\Vert_{L^p(\mu_t)} = |\mu’|(t).
$$

McCann’s displacement interpolation

Theorem 2. If $\Omega\subset \mathbb R^d$ is convex, then all the spaces $W_p(\Omega)$ are length spaces and if $\mu$ and $\nu$ belong to $W_p(\Omega)$ and $\gamma$ is the optimal transport plan from $\mu$ to $\nu$ for the cost $c_p(x,y)=|x-y|^p$, then the curve
$$
\mu^\gamma(t):=(\pi_t)_\sharp \gamma
$$
where $\pi_t:\Omega\times\Omega\to\Omega$ is given by
$$
\pi_t(x,y)=(1-t)x+ty
$$
is a constant-speed geodesic from $\mu$ to $\nu$. In the case $p>1$, all constant-speed geodesics are of this form, and, if $\mu$ is absolutely continuous, then there is only one geodesic and it has the form
$$
\mu_t=(T_t)_\sharp \mu,\qquad T_t=(1-t)\operatorname{id}+tT,
$$
where $T$ is the optimal transport map from $\mu$ to $\nu$. In this case, the velocity field $V_t$ of geodesic $\mu_t$ is given by
$$
v_t=(T-\operatorname{id})\circ (T_t)^{-1}.
$$

In particular, for $t=0$, we have $v_0=-\nabla\varphi$ and for $t=1$, we have $v_1=-\nabla\psi$, where $\varphi$ is the Kantorovich potential in the transport from $\mu$ to $\nu$ and $\psi=\varphi^c$.

Using the characterization of constant speed geodesics as minimizers of a strictly convex kinetic energy, we have

  • Looking for an optimal transport for the cost $c(x,y)=|x-y|^p$ is equivalent to looking for constant-speed geodesics in $W_p$.
  • Constant-speed geodesics may be found by minimizing $\int_0^1 |\mu’|(t)^p\ dt$.
  • In the case of $W_p$, we have $|\mu’|(t)^p=\int_\Omega |v_t|^p\ d\mu_t$, where $v$ is a velocity field solving the continuity equation together with $\mu$.

As a consequence of these considerations, for $p>1$, solving the kinetic energy minimization problem
$$
\min \left\{ \int_0^1 \int_\Omega |v_t|^p\ d\rho_t\ dt \ ; \ \partial_t \rho_t+\nabla\cdot(\rho_t v_t)=0,\ \rho_0=\mu,\ \rho_1=\nu \right\}
$$
selects constant-speed geodesics connecting $\mu$ to $\nu$ and hence allows to find the optimal transport between $\mu$ and $\nu$. This is what is usually called Benamou–Brenier formula.

Minimizing Movement Scheme in the Wasserstein Space and Evolution PDEs

From now on, we consider $W_2(\Omega)$, consider the MMS,
$$
\rho_{k+1}^\tau \in \operatorname{argmin}_\rho \left\{ F(\rho)+\frac{W_2^2(\rho,\rho_k^\tau)}{2\tau} \right\}. \tag{1}
$$

and denote
$$
\mathcal T_c(\rho,\nu):=\min\left\{ \int c(x,y)\ d\gamma,\ \gamma\in\Pi(\rho,\nu) \right\},
$$
for $\nu=\rho_k^\tau$, $c(x,y)=|x-y|^2$.

Given a functional $G:\mathcal P(\Omega)\to\mathbb R$, we call $\dfrac{\delta G}{\delta \rho}(\rho)$, if it exists, the unique (up to additive constants) function such that
$$
\left.\frac{d}{d\varepsilon}G(\rho+\varepsilon\chi)\right|_{\varepsilon=0}
=
\int \frac{\delta G}{\delta \rho}(\rho)\ d\chi,
$$
for every perturbation $\chi$ such that at least for $\varepsilon\in[0,\bar\varepsilon]$, $\rho+\delta\chi\in\mathcal P(\Omega)$. The function $\dfrac{\delta G}{\delta \rho}(\rho)$ is called the first variation of functional $G$ at $\rho$.

Examples: Let $f:\mathbb R\to\mathbb R$ be a convex superlinear function, $V:\Omega\to\mathbb R$, $W:\mathbb R^d\to\mathbb R$ be regular enough, and $W$ is taken symmetric i.e.
$$
W(z)=W(-z).
$$

We have three functionals
$$
\mathcal F(\rho)=
\begin{cases}
\displaystyle \int f(\rho(x))\ dx & \text{if }\rho\ll \operatorname{leb},\
+\infty & \text{otherwise},
\end{cases}
$$
$$
\mathcal V(\rho)=\int V\ d\rho,\qquad
\mathcal W(\rho)=\frac12\iint W(x-y)\ d\rho(x)\ d\rho(y).
$$

Then we have
$$
\frac{\delta\mathcal F}{\delta \rho}(\rho)=f’(\rho),\qquad
\frac{\delta\mathcal V}{\delta \rho}(\rho)=V,\qquad
\frac{\delta\mathcal W}{\delta \rho}(\rho)=W\star \rho.
$$

Proof : We have

$$
\frac{d}{d\varepsilon} \mathcal F(\rho+\varepsilon\chi)\Big| _{\varepsilon=0}
=
\lim _{\varepsilon\to 0} \frac{\mathcal F(\rho+\varepsilon\chi)-\mathcal F(\rho)}{\varepsilon}
=
\lim _{\varepsilon\to 0}\int \frac{f(\rho(x)+\varepsilon\chi(x))-f(\rho(x))}{\varepsilon}\ dx
=
\int f^\prime(\rho(x))\chi(x)\ dx
=
\int f^\prime(\rho(x))\ d\chi(x).
$$

Hence

$$
\frac{\delta\mathcal F}{\delta \rho}(\rho)=f’(\rho).
$$

Moreover,
$$
\frac{d}{d\varepsilon}\mathcal V(\rho+\varepsilon\chi)\Big| _{\varepsilon=0}
=
\lim _{\varepsilon\to 0}\frac{\mathcal V(\rho+\varepsilon\chi)-\mathcal V(\rho)}{\varepsilon}
=
\lim _{\varepsilon\to 0}\int V\ \frac{d(\rho+\varepsilon\chi)-d\rho}{\varepsilon}
=
\int V\ d\chi.
$$
Hence
$$
\frac{\delta\mathcal V}{\delta \rho}(\rho)=V.
$$

Finally,

$$
\begin{aligned}
\frac{d}{d\varepsilon}\mathcal W(\rho+\varepsilon\chi)\Big| _{\varepsilon=0}
&=
\lim _{\varepsilon\to 0}\frac{\mathcal W(\rho+\varepsilon\chi)-\mathcal W(\rho)}{\varepsilon}
=
\lim _{\varepsilon\to 0}\frac{1}{2\varepsilon}\iint W(x-y)\Big[d((\rho+\varepsilon\chi)(x))\ d((\rho+\varepsilon\chi)(y))-d\rho(x)\ d\rho(y)\Big]\\\
&=
\frac12\iint W(x-y)\ d\chi(x)\ d\rho(y)+\frac12\iint W(x-y)\ d\rho(x)\ d\chi(y)
=
\iint W(x-y)\ d\rho(y)\ d\chi(x).
\end{aligned}
$$

Hence
$$
\frac{\delta\mathcal W}{\delta \rho}(\rho)
=
\int W(x-y)\rho(y)\ dy
=
W\star \rho.
$$

Proposition 1. Let $c:\Omega\times\Omega\to\mathbb R$ be a continuous cost function. Then the functional
$$
\rho\mapsto \mathcal T_c(\rho,\nu)
$$
is convex and its subdifferential at $\rho_0$ coincides with the set of Kantorovich potentials
$$
\left\{ \varphi\in C^0(\Omega):\ \int \varphi\ d\rho_0+\int \varphi^c\ d\nu = \mathcal T_c(\rho,\nu) \right\}.
$$

Moreover, if there is a unique $c$-concave Kantorovich potential $\varphi$ from $\rho_0$ to $\nu$, up to additive constants, then we also have
$$
\frac{\delta \mathcal T_c(\rho,\nu)}{\delta \rho}(\rho_0)=\varphi.
$$

For (1) the objection function is just
$$
F(\rho)+\frac{1}{\tau} \mathcal T_{c/2}(\rho,\rho_k^\tau).
$$

The optimal condition is
$$
\frac{\delta F}{\delta \rho}(\rho_{k+1}^\tau)+\frac{\varphi}{\tau}=\text{const},
$$
where $\varphi$ is the Kantorovich potential for the cost $\dfrac{c}{2}=\dfrac12|x-y|^2$.

Combining the fact that the optimal transport map $T(x)=x-\nabla\varphi(x)$, we get

$$
-v(x) := \frac{T(x)-x}{\tau}=-\frac{\nabla\varphi(x)}{\tau} = \nabla\left( \frac{\delta F}{\delta \rho} (\rho) \right)(x).
$$

This suggest that at the limit $\tau\to 0$, we will find a solution of
$$
\partial_t \rho_t-\nabla\cdot\left(\rho \nabla\left[\frac{\delta F}{\delta \rho}(\rho)\right]\right)=0.
$$

Examples:

  • For $\mathcal F(\rho)=\int f(\rho(x))\ dx$ with $f(u)=u\log u$, we have

$$
\frac{\delta\mathcal F}{\delta \rho}(\rho)=f’(\rho)=\log\rho+1,\qquad
\nabla\frac{\delta\mathcal F}{\delta \rho}(\rho)=\frac{\nabla\rho}{\rho}
$$
we have
$$
\partial_t \rho_t=\Delta \rho_t
$$
which is just the Heat equation.

  • For $F(\rho)=\int f(\rho(x))\ dx+\int V(x)\ d\rho(x)$, we get
    $$
    \frac{\delta F}{\delta \rho}(\rho)=\log\rho+1+V
    \Longrightarrow
    \nabla\frac{\delta F}{\delta \rho}=\frac{\nabla\rho}{\rho}+\nabla V.
    $$

    We get the Fokker–Planck equation
    $$
    \partial_t \rho_t-\Delta \rho-\nabla\cdot(\rho\nabla V)=0.
    $$

Reference

Santambrogio, F. {Euclidean, metric, and Wasserstein} gradient flows: an overview. Bull. Math. Sci. 7, 87–154 (2017).

The cover image in this article was taken on North Stradbroke Island, Brisbane, Australia.

Author

Handstein Wang

Posted on

2026-04-16

Updated on

2026-04-16

Licensed under