Posted 2026-04-16Gradient Flows0 visits

Gradient Flows in Wasserstein Space

In this article, we will introduce the gradient flows in Wasserstein space.

For Wasserstein space $W_p := (\mathcal P_p(X), W_p)$, the standard considerations from fluid mechanics tell us that the density $\mu_t$ of a family of particles may be interpreted as the continuity
$$
\partial_t \mu_t + \nabla \cdot (\mu_t v_t) = 0
$$
with $L^p$ vector $v_t$. Moreover, we want to connect the $L^p$ norm of $v_t$ with the metric derivative $|\mu’|(t)$.

Theorem 1. Let $(\mu_t)_{t\in[0,1]}$ be an absolutely continuous curve in $W_p(\Omega)$ (for $p>1$, and $\Omega\subset \mathbb R^d$ an open domain). Then for a.e. $t\in[0,1]$, there exists a vector field $v_t\in L^p(\mu_t;\mathbb R^d)$ such that

the continuity equation $\partial_t \mu_t + \nabla\cdot(\mu_t v_t)=0$ is satisfied in the sense of distributions,
for a.e. $t$, we have
$$
\Vert v_t\Vert_{L^p(\mu_t)} \le |\mu’|(t),
$$
where $|\mu’|(t)$ denotes the metric derivative at the time $t$ of the curve $t\mapsto \mu_t$ w.r.t. the distance $W_p$.

Conversely, if $(\mu _t)$ is a family of measures in $\mathcal P_p (\Omega)$ and for each $t$, we have a vector field $v_t\in L^p(\mu_t;\mathbb R^d)$ with $\int_0^1 \Vert v_t\Vert _{L^p(\mu_t)}\ dt<+\infty$ solving

$$
\partial_t \mu_t + \nabla\cdot(\mu_t v_t)=0,
$$

then $(\mu _t)$ is absolutely continuous in $W _p(\Omega)$ and for a.e. $t$, we have

$$
|\mu’|(t)\le \Vert v_t\Vert_{L^p(\mu_t)}.
$$

Note that as a consequence of the second part of the statement, the vector field $v_t$ introduced in the first part must satisfy
$$
\Vert v_t\Vert_{L^p(\mu_t)} = |\mu’|(t).
$$

McCann’s displacement interpolation

Theorem 2. If $\Omega\subset \mathbb R^d$ is convex, then all the spaces $W_p(\Omega)$ are length spaces and if $\mu$ and $\nu$ belong to $W_p(\Omega)$ and $\gamma$ is the optimal transport plan from $\mu$ to $\nu$ for the cost $c_p(x,y)=|x-y|^p$, then the curve
$$
\mu^\gamma(t):=(\pi_t)_\sharp \gamma
$$
where $\pi_t:\Omega\times\Omega\to\Omega$ is given by
$$
\pi_t(x,y)=(1-t)x+ty
$$
is a constant-speed geodesic from $\mu$ to $\nu$. In the case $p>1$, all constant-speed geodesics are of this form, and, if $\mu$ is absolutely continuous, then there is only one geodesic and it has the form
$$
\mu_t=(T_t)_\sharp \mu,\qquad T_t=(1-t)\operatorname{id}+tT,
$$
where $T$ is the optimal transport map from $\mu$ to $\nu$. In this case, the velocity field $V_t$ of geodesic $\mu_t$ is given by
$$
v_t=(T-\operatorname{id})\circ (T_t)^{-1}.
$$

In particular, for $t=0$, we have $v_0=-\nabla\varphi$ and for $t=1$, we have $v_1=-\nabla\psi$, where $\varphi$ is the Kantorovich potential in the transport from $\mu$ to $\nu$ and $\psi=\varphi^c$.

Using the characterization of constant speed geodesics as minimizers of a strictly convex kinetic energy, we have

Looking for an optimal transport for the cost $c(x,y)=|x-y|^p$ is equivalent to looking for constant-speed geodesics in $W_p$.
Constant-speed geodesics may be found by minimizing $\int_0^1 |\mu’|(t)^p\ dt$.
In the case of $W_p$, we have $|\mu’|(t)^p=\int_\Omega |v_t|^p\ d\mu_t$, where $v$ is a velocity field solving the continuity equation together with $\mu$.

As a consequence of these considerations, for $p>1$, solving the kinetic energy minimization problem
$$
\min \left\{ \int_0^1 \int_\Omega |v_t|^p\ d\rho_t\ dt \ ; \ \partial_t \rho_t+\nabla\cdot(\rho_t v_t)=0,\ \rho_0=\mu,\ \rho_1=\nu \right\}
$$
selects constant-speed geodesics connecting $\mu$ to $\nu$ and hence allows to find the optimal transport between $\mu$ and $\nu$. This is what is usually called Benamou–Brenier formula.

Minimizing Movement Scheme in the Wasserstein Space and Evolution PDEs

From now on, we consider $W_2(\Omega)$, consider the MMS,
$$
\rho_{k+1}^\tau \in \operatorname{argmin}_\rho \left\{ F(\rho)+\frac{W_2^2(\rho,\rho_k^\tau)}{2\tau} \right\}. \tag{1}
$$

and denote
$$
\mathcal T_c(\rho,\nu):=\min\left\{ \int c(x,y)\ d\gamma,\ \gamma\in\Pi(\rho,\nu) \right\},
$$
for $\nu=\rho_k^\tau$, $c(x,y)=|x-y|^2$.

Given a functional $G:\mathcal P(\Omega)\to\mathbb R$, we call $\dfrac{\delta G}{\delta \rho}(\rho)$, if it exists, the unique (up to additive constants) function such that
$$
\left.\frac{d}{d\varepsilon}G(\rho+\varepsilon\chi)\right|_{\varepsilon=0}
=
\int \frac{\delta G}{\delta \rho}(\rho)\ d\chi,
$$
for every perturbation $\chi$ such that at least for $\varepsilon\in[0,\bar\varepsilon]$, $\rho+\delta\chi\in\mathcal P(\Omega)$. The function $\dfrac{\delta G}{\delta \rho}(\rho)$ is called the first variation of functional $G$ at $\rho$.

Examples: Let $f:\mathbb R\to\mathbb R$ be a convex superlinear function, $V:\Omega\to\mathbb R$, $W:\mathbb R^d\to\mathbb R$ be regular enough, and $W$ is taken symmetric i.e.
$$
W(z)=W(-z).
$$

We have three functionals
$$
\mathcal F(\rho)=
\begin{cases}
\displaystyle \int f(\rho(x))\ dx & \text{if }\rho\ll \operatorname{leb},\
+\infty & \text{otherwise},
\end{cases}
$$
$$
\mathcal V(\rho)=\int V\ d\rho,\qquad
\mathcal W(\rho)=\frac12\iint W(x-y)\ d\rho(x)\ d\rho(y).
$$

Then we have
$$
\frac{\delta\mathcal F}{\delta \rho}(\rho)=f’(\rho),\qquad
\frac{\delta\mathcal V}{\delta \rho}(\rho)=V,\qquad
\frac{\delta\mathcal W}{\delta \rho}(\rho)=W\star \rho.
$$

Proof : We have

$$
\frac{d}{d\varepsilon} \mathcal F(\rho+\varepsilon\chi)\Big| _{\varepsilon=0}
=
\lim _{\varepsilon\to 0} \frac{\mathcal F(\rho+\varepsilon\chi)-\mathcal F(\rho)}{\varepsilon}
=
\lim _{\varepsilon\to 0}\int \frac{f(\rho(x)+\varepsilon\chi(x))-f(\rho(x))}{\varepsilon}\ dx
=
\int f^\prime(\rho(x))\chi(x)\ dx
=
\int f^\prime(\rho(x))\ d\chi(x).
$$

Hence

$$
\frac{\delta\mathcal F}{\delta \rho}(\rho)=f’(\rho).
$$

Moreover,
$$
\frac{d}{d\varepsilon}\mathcal V(\rho+\varepsilon\chi)\Big| _{\varepsilon=0}
=
\lim _{\varepsilon\to 0}\frac{\mathcal V(\rho+\varepsilon\chi)-\mathcal V(\rho)}{\varepsilon}
=
\lim _{\varepsilon\to 0}\int V\ \frac{d(\rho+\varepsilon\chi)-d\rho}{\varepsilon}
=
\int V\ d\chi.
$$
Hence
$$
\frac{\delta\mathcal V}{\delta \rho}(\rho)=V.
$$

Finally,

$$
\begin{aligned}
\frac{d}{d\varepsilon}\mathcal W(\rho+\varepsilon\chi)\Big| _{\varepsilon=0}
&=
\lim _{\varepsilon\to 0}\frac{\mathcal W(\rho+\varepsilon\chi)-\mathcal W(\rho)}{\varepsilon}
=
\lim _{\varepsilon\to 0}\frac{1}{2\varepsilon}\iint W(x-y)\Big[d((\rho+\varepsilon\chi)(x))\ d((\rho+\varepsilon\chi)(y))-d\rho(x)\ d\rho(y)\Big]\\\
&=
\frac12\iint W(x-y)\ d\chi(x)\ d\rho(y)+\frac12\iint W(x-y)\ d\rho(x)\ d\chi(y)
=
\iint W(x-y)\ d\rho(y)\ d\chi(x).
\end{aligned}
$$

Hence
$$
\frac{\delta\mathcal W}{\delta \rho}(\rho)
=
\int W(x-y)\rho(y)\ dy
=
W\star \rho.
$$

Proposition 1. Let $c:\Omega\times\Omega\to\mathbb R$ be a continuous cost function. Then the functional
$$
\rho\mapsto \mathcal T_c(\rho,\nu)
$$
is convex and its subdifferential at $\rho_0$ coincides with the set of Kantorovich potentials
$$
\left\{ \varphi\in C^0(\Omega):\ \int \varphi\ d\rho_0+\int \varphi^c\ d\nu = \mathcal T_c(\rho,\nu) \right\}.
$$

Moreover, if there is a unique $c$-concave Kantorovich potential $\varphi$ from $\rho_0$ to $\nu$, up to additive constants, then we also have
$$
\frac{\delta \mathcal T_c(\rho,\nu)}{\delta \rho}(\rho_0)=\varphi.
$$

For (1) the objection function is just
$$
F(\rho)+\frac{1}{\tau} \mathcal T_{c/2}(\rho,\rho_k^\tau).
$$

The optimal condition is
$$
\frac{\delta F}{\delta \rho}(\rho_{k+1}^\tau)+\frac{\varphi}{\tau}=\text{const},
$$
where $\varphi$ is the Kantorovich potential for the cost $\dfrac{c}{2}=\dfrac12|x-y|^2$.

Combining the fact that the optimal transport map $T(x)=x-\nabla\varphi(x)$, we get

$$
-v(x) := \frac{T(x)-x}{\tau}=-\frac{\nabla\varphi(x)}{\tau} = \nabla\left( \frac{\delta F}{\delta \rho} (\rho) \right)(x).
$$

This suggest that at the limit $\tau\to 0$, we will find a solution of
$$
\partial_t \rho_t-\nabla\cdot\left(\rho \nabla\left[\frac{\delta F}{\delta \rho}(\rho)\right]\right)=0.
$$

Examples:

For $\mathcal F(\rho)=\int f(\rho(x))\ dx$ with $f(u)=u\log u$, we have

$$
\frac{\delta\mathcal F}{\delta \rho}(\rho)=f’(\rho)=\log\rho+1,\qquad
\nabla\frac{\delta\mathcal F}{\delta \rho}(\rho)=\frac{\nabla\rho}{\rho}
$$
we have
$$
\partial_t \rho_t=\Delta \rho_t
$$
which is just the Heat equation.

For $F(\rho)=\int f(\rho(x))\ dx+\int V(x)\ d\rho(x)$, we get
$$
\frac{\delta F}{\delta \rho}(\rho)=\log\rho+1+V
\Longrightarrow
\nabla\frac{\delta F}{\delta \rho}=\frac{\nabla\rho}{\rho}+\nabla V.
$$

We get the Fokker–Planck equation
$$
\partial_t \rho_t-\Delta \rho-\nabla\cdot(\rho\nabla V)=0.
$$

Reference

Santambrogio, F. {Euclidean, metric, and Wasserstein} gradient flows: an overview. Bull. Math. Sci. 7, 87–154 (2017).

The cover image in this article was taken on North Stradbroke Island, Brisbane, Australia.

Gradient Flows in Wasserstein Space

https://handsteinwang.github.io/2026/04/16/gradient-flows-4/

Author

Handstein Wang

Posted on

2026-04-16

Updated on

2026-04-16

Licensed under

#Gradient Flows

Gradient Flows in Wasserstein Space

McCann’s displacement interpolation

Minimizing Movement Scheme in the Wasserstein Space and Evolution PDEs

Reference

Author

Posted on

Updated on

Licensed under

Links

Categories

Recents

Archives

Tags

follow.it