The General Theory in Metric Spaces
In this article, we will introduce the general theory in metric spaces.
Preliminaries
- Metric derivative. Given a curve $x:[0,T]\to X$ valued in a metric space, we can define the speed
$$
|x’|(t):=\lim_{h\to0}\frac{d(x(t),x(t+h))}{|h|}
$$
provided the limit exists.
Slope and modulus of the gradient.
Upper gradient. If $g:X\to\mathbb{R}$ such that for every Lipschitz curve $x$,
$$
|F(x(t_1))-F(x(t_0))|\le \int_0^1 g(x(t))|x’|(t)\ dt.
$$
If $F$ is Lipschitz continuous, a possible choice is the local Lipschitz constant
$$
|\nabla F|(x):=\limsup_{y\to x}\frac{|F(y)-F(x)|}{d(x,y)}.
$$
- Descending slope (slope in short)
$$
|\nabla^-F|(x):=\limsup_{y\to x}\frac{[F(x)-F(y)]_+}{d(x,y)}.
$$
- Geodesic convexity. On a geodesic metric space, we say a function $F$ is geodesically convex if for every pair $(x(0),x(1))$ there exists a geodesic $x$ with constant speed connecting these two points and such that
$$
F(x(t))\le (1-t)F(x(0))+tF(x(1)).
$$
We can also define $\lambda$-convex by
$$
F(x(t))\le (1-t)F(x(0))+tF(x(1))-\lambda\frac{t(1-t)}{2}d^2(x(0),x(1)).
$$
Existence of a gradient flow
Let us suppose that the space $X$ and the function $F$ are such that every sub-level set $\{F\le c\}$ is compact in $X$, either for the topology induced by the distance $d$, or for a weaker topology such that $d$ is lower semi-continuous. $F$ is required to be l.s.c. in the same topology. This is the minimal framework to guarantee existence of the minimizers at each step.
Even if estimate (11) is enough to provide compactness and thus the existence of GMM, it will be never enough to characterize the limit curve (indeed, it is satisfied by any discrete evolution where $x_{k+1}^\tau$ gives a better value than $x_k^\tau$, without any need for optimality).
Variational interpolation (by De Giorgi). Once we fix $x_k^\tau$, for every $\theta\in(0,1]$, consider
$$
\min_x\ F(x)+\frac{d^2(x,x_k^\tau)}{2\theta\tau}
$$
and call $x(\theta)$ any minimizer of this problem and $\varphi(\theta)$ the minimal value.
Then we have
(1) for $\theta\to0^+$, we have $x(\theta)\to x_k^\tau$ and $\varphi(\theta)\to F(x_k^\tau)$.
Proof. Since $x(\theta)$ is the minimizer, we have
$$
F(x(\theta))\le F(x(\theta))+\frac{d^2(x(\theta),x_k^\tau)}{2\theta\tau}\le F(x_k^\tau).
$$
That is
$$
\frac{d^2(x(\theta),x_k^\tau)}{2\theta\tau}\le F(x_k^\tau)-F(x(\theta)).
$$
Since $x(\theta)\in\{F\le F(x_k^\tau)\}$ which is compact, the RHS above is bounded. Therefore $d(x(\theta),x_k^\tau)\to0$.
For $\varphi(\theta)$, we have $\varphi(\theta)\le F(x_k^\tau)$.
On the other hand
$$
\varphi(\theta)=F(x(\theta))+\frac{d^2(x(\theta),x_k^\tau)}{2\theta\tau}\ge F(x(\theta)).
$$
By the l.s.c. of $F$, we have
$$
F(x_k^\tau)\le \liminf_{\theta\to0^+}F(x(\theta))
\le \liminf_{\theta\to0^+}\varphi(\theta)
\le \limsup_{\theta\to0^+}\varphi(\theta)\le F(x_k^\tau).
$$
Hence
$$
\lim_{\theta\to0^+}\varphi(\theta)=F(x_k^\tau).\quad \square
$$
(2) for $\theta=1$, we get back to the original problem with minimizer $x_{k+1}^\tau$.
(3) the function $\varphi$ is non-increasing and hence a.e. differentiable. Moreover
$$
\varphi’(\theta)=-\frac{d^2(x(\theta),x_k^\tau)}{2\theta^2\tau}.
$$
which also means $d(x(\theta),x_k^\tau)$ does not depend on the minimizer $x(\theta)$ for all $\theta$ such that $\varphi’(\theta)$ exists.
Proof. Let $G_\theta(x):=F(x)+\dfrac{d^2(x,x_k^\tau)}{2\theta\tau}$, then
$$
\varphi(\theta)=\min_x G_\theta(x)=G_\theta(x(\theta)).
$$
For sufficiently small $h>0$, we have
$$
\varphi(\theta+h)\le G_{\theta+h}(x(\theta))
=F(x(\theta))+\frac{d^2(x(\theta),x_k^\tau)}{2(\theta+h)\tau}
$$
and
$$
\varphi(\theta)=F(x(\theta))+\frac{d^2(x(\theta),x_k^\tau)}{2\theta\tau}.
$$
Hence
$$
\frac{\varphi(\theta+h)-\varphi(\theta)}{h}
\le \frac{d^2(x(\theta),x_k^\tau)}{2\tau}\frac{\frac{1}{\theta+h}-\frac{1}{\theta}}{h}
= -\frac{d^2(x(\theta),x_k^\tau)}{2\theta(\theta+h)\tau}.
$$
Then
$$
\varphi’(\theta)=\lim_{h\to0}\frac{\varphi(\theta+h)-\varphi(\theta)}{h}
\le -\frac{d^2(x(\theta),x_k^\tau)}{2\theta^2\tau}.
\tag{$\star$}
$$
On the other hand
$$
\varphi(\theta)\le G_\theta(x(\theta+h))
=F(x(\theta+h))+\frac{d^2(x(\theta+h),x_k^\tau)}{2\theta\tau}.
$$
Meanwhile,
$$
\varphi(\theta+h)=F(x(\theta+h))+\frac{d^2(x(\theta+h),x_k^\tau)}{2(\theta+h)\tau}.
$$
We have
$$
\frac{\varphi(\theta+h)-\varphi(\theta)}{h}
\ge \frac{d^2(x(\theta+h),x_k^\tau)}{2\tau}\frac{\frac{1}{\theta+h}-\frac{1}{\theta}}{h}
= -\frac{d^2(x(\theta+h),x_k^\tau)}{2\theta(\theta+h)\tau}.
$$
Let $h\to0$, we have
$$
\varphi’(\theta)\ge -\limsup_{h\to0}\frac{d^2(x(\theta+h),x_k^\tau)}{2\theta(\theta+h)\tau}.
\tag{$\star\star$}
$$
Since all $x(\theta+h)\in\{F\le F(x_k^\tau)\}$ which is compact, there exists $h_j\to0$ such that
$$
x(\theta+h_j)\to \bar x.
$$
Since $F$ and $d$ are l.s.c., then $G_\theta$ is also l.s.c. Hence
$$
\begin{aligned}
G_\theta(\bar x)\le \liminf_{j\to\infty}G_\theta(x(\theta+h_j))&=\liminf_{j\to\infty}\left[G_{\theta+h_j}(x(\theta+h_j))
+\frac{d^2(x(\theta+h_j),x_k^\tau)}{2\tau}
\left(\frac{1}{\theta}-\frac{1}{\theta+h_j}\right)\right]\\\
&=\lim_{j\to\infty}\varphi(\theta+h_j)=\varphi(\theta)\le G_\theta(\bar x),
\end{aligned}
$$
where, we can similarly show that $d^2(x(\theta+h_j),x_k^\tau)$ is bounded and since $\varphi$ is differentiable at $\theta$, and then also continuous at $\theta$, and that $x(\theta)$ is the minimizer of $G_\theta$. Therefore
$$
G_\theta(\bar x)=\varphi(\theta).
$$
$\bar x$ is the minimizer at $\theta$. On the other hand
$$
\varphi(\theta+h_j)=F(x(\theta+h_j))
+\frac{d^2(x(\theta+h_j),x_k^\tau)}{2(\theta+h_j)\tau},
$$
then by the l.s.c. of $F$ and $d$,
$$
\varphi(\theta)=\liminf_{j\to\infty}\varphi(\theta+h_j)
\ge F(\bar x)+\frac{d^2(\bar x,x_k^\tau)}{2\theta\tau}
=G_\theta(\bar x)=\varphi(\theta).
$$
Hence all the inequality by l.s.c. becomes equality, especially
$$
\lim_{j\to\infty}\frac{d^2(x(\theta+h_j),x_k^\tau)}{2(\theta+h_j)\tau}
=\frac{d^2(\bar x,x_k^\tau)}{2\theta\tau}.
$$
Therefore,
$$
d(x(\theta+h_j),x_k^\tau)\to d(\bar x,x_k^\tau),
$$
and then by ($\star\star$),
$$
\varphi’(\theta)\ge -\lim_{j\to\infty}\frac{d^2(x(\theta+h_j),x_k^\tau)}{2(\theta+h_j)\tau}
= -\frac{d^2(\bar x,x_k^\tau)}{2\theta^2\tau}.
$$
Combine ($\star$) and ($\star\star$), we get (note that ($\star$) is true for all minimizer $x(\theta)$ and $\bar x$ is also a minimizer)
$$
\varphi’(\theta)= -\frac{d^2(x(\theta),x_k^\tau)}{2\theta^2\tau}.\quad \square
$$
(4) we have
$$
|\nabla^-F|(x(\theta))\le \frac{d(x(\theta),x_k^\tau)}{\theta\tau}.
$$
Proof. Since $x(\theta)$ is optimal, for all $y$
$$
F(y)+\frac{d^2(y,x_k^\tau)}{2\theta\tau}
\ge F(x(\theta))+\frac{d^2(x(\theta),x_k^\tau)}{2\theta\tau}.
$$
We have
$$
F(x(\theta))-F(y)\le \frac{1}{2\theta\tau}
\big(d^2(y,x_k^\tau)-d^2(x(\theta),x_k^\tau)\big)=\frac{1}{2\theta\tau}\big(d(x(\theta),x_k^\tau)+d(y,x_k^\tau)\big)d(x(\theta),y).
$$
Therefore by the l.s.c. of $d$
$$
\limsup_{y\to x(\theta)}
\frac{[F(x(\theta))-F(y)]_+}{d(x(\theta),y)}
\le \frac{1}{2\theta\tau}\cdot 2d(x(\theta),x_k^\tau)
=\frac{d(x(\theta),x_k^\tau)}{\theta\tau}.\quad \square
$$
(5) due to the possible singular part of the derivative for monotone functions, we have
$$
\varphi(0)-\varphi(1)\ge -\int_0^1 \varphi’(\theta)\ d\theta.
$$
Together with the inequality
$$
-\varphi’(\theta)=\frac{d^2(x(\theta),x_k^\tau)}{2\theta^2\tau}
\ge \frac{\tau}{2}|\nabla^-F|^2(x(\theta)),
$$
we get
$$
F(x_k^\tau)-\left(F(x_{k+1}^\tau)+\frac{d^2(x_{k+1}^\tau,x_k^\tau)}{2\tau}\right)
\ge \frac{\tau}{2}\int_0^1 |\nabla^-F|^2(x(\theta))\ d\theta.
$$
If we sum up for $k=0,1,\cdots$ and take the limit $\tau\to0$, under some suitable assumptions, we can prove for every GMM $x$ we have
$$
F(x(t))+\frac12\int_0^t |x’|^2(r)\ dr
+\frac12\int_0^t |\nabla^-F|^2(x(r))\ dr
\le F(x(0)).
$$
But it is not exactly the EDE
- it is an inequality
- just compare instants $t$ and $0$ instead of $t$ and $s$
If we want equality for every pair $(t,s)$ we need to require the slope to be an upper gradient. Indeed, in this case,
$$
F(x(0))-F(x(t))
\le \int_0^t |\nabla^-F|(x(r))|x’|(r)\ dr
$$
and then we get the equality and also allows to subtract the equalities for $t$ and $s$ and get for $s<t$,
$$
F(x(t))+\frac12\int_s^t |x’|^2(r)\ dr
+\frac12\int_s^t |\nabla^-F|^2(x(r))\ dr
=F(x(s)).
$$
Magically, as it happens that the assumption that $F$ is $\lambda$-geodesically convex make all the assumptions hold true.
Uniqueness and contractivity
Between EDE and EVI, by Savaré, we have
- All curves which are gradient flows in EVI sense also satisfy the EDE condition
- The EDE condition is not in general enough to guarantee uniqueness of the gradient flow. A simple example is $X=\mathbb{R}^2$, with $l^\infty$ distance
$$
d((x_1,y_1),(x_2,y_2))=|x_1-y_1|\vee |x_2-y_2|
$$
and take $F(x_1,x_2)=x_1$, then any curve $(x_1(t),x_2(t))$ with $x_1’(t)=-1$ and $|x_2’(t)|\le1$ satisfies EDE.
Proof. It is easy to see that $|\nabla^-F|(x)=1$ for all $x\in X$, and
$$
|x’|(t)=\lim_{h\to0}\frac{d(x(t+h),x(t))}{|h|}
=\lim_{h\to0}\frac{\max\{|x_1(t+h)-x_1(t)|,\ |x_2(t+h)-x_2(t)|\}}{|h|}
=\max\{|x_1’(t)|,\ |x_2’(t)|\}.
$$
If $x_1’(t)=1$, $|x_2’(t)|\le1$, $|x’|(t)=1$. Now, we need to check EDE: for $s<t$
$$
F(x(s))-F(x(t))
=\int_s^t \frac12|x’|^2(r)+\frac12|\nabla^-F|^2(x(r))\ dr.
$$
LHS $=x_1(s)-x_1(t)$, by $x_1’(t)=1$, we have $x_1(t)=x_1(0)-t$ for all $t$, hence
$$
\text{LHS}=(x_1(0)-s)-(x_1(0)-t)=t-s. \quad \text{RHS}=\int_s^t \frac12+\frac12\ dr=t-s.
$$
Hence,the EDE holds. $\square$
- Existence of gradient flow of EDE sense is easy to get
- The EVI condition is general too strong to get existence, but always guarantees uniqueness and stability.
Proposition. If two curves $x,y:[0,T]\to X$ satisfy the $\mathrm{EVI}_\lambda$ condition, then we have
$$
\frac{d}{dt}d(x(t),y(t))^2\le -2\lambda\ d(x(t),y(t))^2
$$
and
$$
d(x(t),y(t))\le e^{-\lambda t}d(x(0),y(0)).
$$
Proof. By $\mathrm{EVI}_\lambda$ condition, for all $y\in X$
$$
\frac{d}{dt}\frac12 d(x(t),y)^2\le F(y)-F(x(t))-\frac{\lambda}{2}d(x(t),y)^2.
$$
Take $y=y(t_0)$.
$$
\left.\frac{d}{dt}\frac12 d(x(t),y(t_0))^2\right|_{t=t_0}
\le F(y(t_0))-F(x(t_0))-\frac{\lambda}{2}d(x(t_0),y(t_0))^2.
$$
Similarly,
$$
\left.\frac{d}{ds}\frac12 d(x(t_0),y(s))^2\right|_{s=t_0}
\le F(x(t_0))-F(y(t_0))-\frac{\lambda}{2}d(x(t_0),y(t_0))^2.
$$
Add them up, we get
$$
\frac{d}{dt}\frac12 d(x(t),y(t_0))^2\Big| _{t=t_0}
=\frac{d}{dt}\frac12 d(x(t),y(t_0))^2\Big| _{t=t_0}
+\frac{d}{ds}\frac12 d(x(t_0),y(s))^2\Big| _{s=t_0}\le -\lambda\ d(x(t_0),y(t_0))^2.
$$
Hence
$$
\frac{d}{dt}d(x(t),y(t))^2\le -2\lambda\ d(x(t),y(t))^2.
$$
By Gronwall inequality
$$
d(x(t),y(t))\le e^{-\lambda t}d(x(0),y(0)).\quad \square
$$
If we want a satisfying theory for gradient flows which includes uniqueness, we just need to prove the existence of curves which satisfy the EVI condition, accepting that this will probably require additional assumptions.
This assumption, that we will call $C^2G^2$ (Compatible Convexity along Generalized Geodesics), is the following: suppose that for every pair $(x_0,x_1)$, and every $y\in X$, there is a curve $x(t)$ connecting $x(0)=x_0$ to $x(1)=x_1$, such that
- ($F$ is $\lambda$-convex)
$$
F(x(t))\le (1-t)F(x_0)+tF(x_1)-\lambda\frac{t(1-t)}{2}d^2(x_0,x_1)
$$
- ($x\mapsto d^2(x,y)$ is $2$-convex)
$$
d^2(x(t),y)\le (1-t)d^2(x_0,y)+t\ d^2(x_1,y)-t(1-t)d^2(x_0,x_1).
$$
Reference
Santambrogio, F. {Euclidean, metric, and Wasserstein} gradient flows: an overview. Bull. Math. Sci. 7, 87–154 (2017).
The cover image in this article was taken on Tinian Island, a U.S. territory in the Northern Mariana Islands.
The General Theory in Metric Spaces
https://handsteinwang.github.io/2026/04/15/gradient-flows-3/
