Gradient Flows in the Euclidean Space
In this article, we will introduce some basic results of gradient flows in the Euclidean space.
Given a function $F:\mathbb{R}^n \to \mathbb{R}$, smooth enough and a point $x_0 \in \mathbb{R}^n$. A gradient flow is just defined as a curve $X(t)$, with starting point at $t=0$ given by $x_0$, which moves by choosing at each instant of time the direction which makes the function $F$ decrease as much as possible. More precisely, we consider the solution of the Cauchy problem
$$
\begin{cases}
x’(t) = -\nabla F(x(t)) & \text{for } t>0, \\\
x(0)=x_0
\end{cases}
\tag{1}
$$
This is a standard Cauchy problem which has a unique solution if $\nabla F$ is Lipschitz continuous.
If $F$ is convex, we can replace the gradient with subdifferential, looking for absolutely continuous curve $x:[0,T]\to \mathbb{R}^n$
$$
\begin{cases}
x’(t)\in -\partial F(x(t)) & \text{for a.e. } t>0, \\\
x(0)=x_0
\end{cases}
\tag{2}
$$
where
$$
\partial F(x):=\left\{ p\in \mathbb{R}^n \mid F(y)\ge F(x)+p\cdot (y-x)\ \text{for all } y\in \mathbb{R}^n \right\}.
$$
Properties of subdifferential
- $F$ is differentiable at $x$ iff $\partial F(x)=\{\nabla F(x)\}$.
- $\partial F(x)$ is a convex set.
- $\partial F(x)$ is non-empty for $x\in \operatorname{dom} F$.
- For every $x_1,x_2$ and $p_1\in \partial F(x_1)$, $p_2\in \partial F(x_2)$,
$$
(x_1-x_2)\cdot (p_1-p_2)\ge 0.
$$
We denote by $\partial^\circ F(x)$ its element of minimal norm.
Proposition 1. Suppose that $F$ is convex and let $x_1,x_2$ be two solutions of (2). Then we have
$$
|x_1(t)-x_2(t)|\le |x_1(0)-x_2(0)|
$$
for every $t\ge 0$. In particular, this gives uniqueness of solution of the Cauchy problem.
Proof. Let us consider
$$
g(t)=\frac12 |x_1(t)-x_2(t)|^2.
$$
Then we have
$$
g’(t)=(x_1(t)-x_2(t))\cdot (x_1’(t)-x_2’(t)).
$$
Since $x_1’(t)\in -\partial F(x_1(t))$, $x_2’(t)\in -\partial F(x_2(t))$, we have
$$
g’(t)\le 0.
$$
Therefore
$$
g(t)\le g(0).\quad \square
$$
Remark. We recall that $F$ is semi-convex means that there exists $\lambda\in \mathbb{R}$ such that
$$
x\mapsto F(x)-\frac{\lambda}{2}|x|^2
$$
is convex.
For $\lambda$-convex function $F$, we can define the subdifferential by
$$
\partial F(x)=\left\{ p\in \mathbb{R}^n \ \middle| \ F(y)\ge F(x)+p\cdot (y-x)+\frac{\lambda}{2}|y-x|^2 \ \text{for all } y\in \mathbb{R}^n \right\}.
$$
Again, we define $\partial^\circ F$ the element of minimal norm of $\partial F$.
If $F$ is $\lambda$-convex, then for $x_1,x_2,p_1,p_2$ with $p_i\in \partial F(x_i)$, $i=1,2$, one has
$$
(x_1-x_2)\cdot (p_1-p_2)\ge \lambda |x_1-x_2|^2.
$$
This implies
$$
g’(t)\le -2\lambda g(t).
$$
By Gronwall’s inequality
$$
g(t)\le g(0)e^{-2\lambda t}.
$$
If $\lambda>0$, then $F$ is strictly convex and admits a unique minimizer $\bar{x}$.
Take $x_2(t)\equiv \bar{x}$, which is also a solution since $0\in \partial F(\bar{x})$. Then we get
$$
|x_1(t)-\bar{x}|\le e^{-\lambda t}|x_1(0)-\bar{x}|.
$$
Proposition 2. Suppose that $F$ is $\lambda$-convex and let $x$ be a solution of (2). Then, for all times $t_0$ such that both $t\mapsto x(t)$ and $t\mapsto F(x(t))$ are differentiable at $t=t_0$, the subdifferential $\partial F(x(t_0))$ is contained in a hyperplane orthogonal to $x’(t_0)$. In particular, we have
$$
x’(t)=-\partial^\circ F(x(t)) \qquad \text{for a.e. } t.
$$
Proof. Let $t_0$ be as in the statement and $p\in \partial F(x(t_0))$. By the definition of subdifferential
$$
F(x(t))\ge F(x(t_0))+p\cdot (x(t)-x(t_0))+\frac{\lambda}{2}|x(t)-x(t_0)|^2
\qquad \text{for all } t,
$$
and it is equality when $t=t_0$. Hence
$$
t\mapsto F(x(t))-F(x(t_0))-p\cdot (x(t)-x(t_0))-\frac{\lambda}{2}|x(t)-x(t_0)|^2
$$
is minimal for $t=t_0$, and then
$$
\frac{d}{dt}F(x(t))\Big|_{t=t_0}=p\cdot x’(t_0).
$$
Hence
$$
\partial F(x(t_0))\subseteq \left\{ p\in \mathbb{R}^n \mid p\cdot x’(t_0)=\mathrm{const} \right\},
$$
which is a hyperplane orthogonal to $x’(t_0)$.
Since for a.e. $t_0$,
$$
-x’(t_0)\in \partial F(x(t_0)),
$$
we have
$$
\mathrm{const}=-x’(t_0)\cdot x’(t_0)=-|x’(t_0)|^2.
$$
The hyperplane is
$$
\left\{ p\in \mathbb{R}^n \mid p\cdot x’(t_0)=-|x’(t_0)|^2 \right\}.
$$
Hence the orthogonal projection of $0$ onto the hyperplane which contains $\partial F(x(t_0))$ is
$$
\frac{x’(t_0)}{|x’(t_0)|^2}\cdot \big(-|x’(t_0)|^2\big)=-x’(t_0).
$$
This provides
$$
x’(t_0)=-\partial^\circ F(x(t_0))
$$
for a.e. $t_0$. $\square$
Discretization in time
Fix a small step time parameter $\tau>0$, define
$$
x_{k+1}^\tau \in \operatorname*{argmin} _x \left\{ F(x)+\frac{|x-x_k^\tau|^2}{2\tau} \right\}.
\tag{3}
$$
and we say $(x_k^\tau) _k$ the Minimizing Movement Scheme.
If $F$ is smooth, then
$$
x _{k+1}^\tau \in \operatorname*{argmin} _x \left\{ F(x)+\frac{| x-x_k^\tau |^2}{2\tau} \right\}
\implies
\nabla F(x _{k+1}^\tau)+\frac{x _{k+1}^\tau-x _k^\tau}{\tau}=0.
$$
Hence,
$$
\frac{x_{k+1}^\tau-x_k^\tau}{\tau}=-\nabla F(x_{k+1}^\tau),
$$
which is exactly the discrete-time implicit Euler scheme.
Given ODE
$$
\begin{cases}
x’(t)=V(x(t)),\\\
x(0)=x_0,
\end{cases}
$$
Explicit scheme
$$
x_{k+1}^\tau=x_k^\tau+\tau V(x_k^\tau), \qquad x_0=x_0.
$$Implicit scheme
$$
x_{k+1}^\tau=x_k^\tau+\tau V(x_{k+1}^\tau), \qquad x_0=x_0.
$$
Now, we want to show that for $\tau\to 0$, the sequence $(x_k^\tau)$ we found, suitably interpolated, converges to the solution of (2).
First, define
$$
v_{k+1}^\tau:=\frac{x_{k+1}^\tau-x_k^\tau}{\tau}.
$$
Then we set
$$
x^\tau(t):=x_{k+1}^\tau, \qquad \text{for } t\in (k\tau,(k+1)\tau],
$$
$$
\tilde x^\tau(t):=x_k^\tau+(t-k\tau)v_{k+1}^\tau, \qquad \text{for } t\in (k\tau,(k+1)\tau],
$$
$$
v^\tau(t):=v_{k+1}^\tau, \qquad \text{for } t\in (k\tau,(k+1)\tau].
$$
It is easy to see that $\tilde x^\tau$ is continuous and pairwise affine, satisfying
$$
(\tilde x^\tau)’=v^\tau.
$$
What’s more, $x^\tau$ is not continuous, but by definition
$$
v^\tau(t)\in -\partial F(x^\tau(t)).
$$
By the definition of $x_{k+1}^\tau$, we have
$$
F(x_{k+1}^\tau)+\frac{|x_{k+1}^\tau-x_k^\tau|^2}{2\tau}\le F(x_k^\tau).
\tag{4}
$$
If $F(x_0)<+\infty$ and $\inf F>-\infty$, summing over $k$, we get
$$
\sum_{k=0}^{\ell}\frac{|x_{k+1}^\tau-x_k^\tau|^2}{2\tau}
\le
\big(F(x_0^\tau)-F(x_{\ell+1}^\tau)\big)\le C.
\tag{5}
$$
Note that
$$
\frac{|x_{k+1}^\tau-x_k^\tau|^2}{2\tau}
=
\frac{\tau}{2}\left|\frac{x_{k+1}^\tau-x_k^\tau}{\tau}\right|^2
=
\frac{\tau}{2}|v_{k+1}^\tau|^2
=
\frac12\int_{k\tau}^{(k+1)\tau}|(\tilde x^\tau)’(t)|^2\ dt.
$$
This means that we have
$$
\int_0^T \frac12 |(\tilde x^\tau)’(t)|^2\ dt\le C.
\tag{6}
$$
Hence $\tilde x^\tau$ is bounded in $H^1$ and $v^\tau$ in $L^2$. By Hölder inequality, for $t\ge s$,
$$
|\tilde x^\tau(t)-\tilde x^\tau(s)|
=
\left|\int_s^t (\tilde x^\tau)’(u)\ du\right|
\le
\int_s^t |(\tilde x^\tau)’(u)|\ du
\le
\left(\int_s^t |(\tilde x^\tau)’(u)|^2\ du\right)^{1/2}|t-s|^{1/2}
\le C|t-s|^{1/2}.
\tag{7}
$$
Therefore $\tilde x^\tau$ is equicontinuous for all $\tau>0$. Moreover,
$$
|\tilde x^\tau(t)-x_0|
=
|\tilde x^\tau(t)-\tilde x^\tau(0)|
\le Ct^{1/2}\le CT^{1/2}.
$$
Hence $\tilde x^\tau$ is also equibounded. Furthermore, since $x^\tau(t)=\tilde x^\tau(s)$ for a certain $s=k\tau$ with $|s-t|\le \tau$, we have
$$
|x^\tau(t)-\tilde x^\tau(t)|\le C\tau^{1/2}.
\tag{8}
$$
Proposition 3. Let $\tilde x^\tau$, $x^\tau$ and $v^\tau$ be constructed as above using the minimizing movement scheme. Suppose $F(x_0)<+\infty$ and $\inf F>-\infty$, and lower-semicontinuous. Then, up to a subsequence $\tau_j\to 0$ (still denoted by $\tau$), both $\tilde x^\tau$ and $x^\tau$ converge uniformly to a same curve $x\in H^1$, and $v^\tau$ weakly converges in $L^2$ to a vector function $v$ such that $x’=v$ and
if $F$ is $\lambda$-convex, we have
$$
v(t)\in -\partial F(x(t)) \qquad \text{for a.e. } t,
$$
i.e. $x$ is a solution of (2);if $F$ is $C^1$, we have
$$
v(t)=-\nabla F(x(t)) \qquad \text{for all } t,
$$
i.e. $x$ is a solution of (1).
Proof. By the above analysis, $\tilde x^\tau$ is continuous, equicontinuous and equibounded. By applying Arzelà–Ascoli theorem, we get a uniformly converging subsequence (still denoted by $\tau$)
$$
\tilde x^{\tau}\to x
\qquad \text{as } \tau\to 0
$$
By (8), $x^\tau$ also uniformly converges to the same limit $x$. Moreover by (6), we know that $v^\tau$ is equibounded in $L^2$ which is a reflexive Banach space. Thus, up to an extra subsequence extraction (still denoted by $\tau$),
$$
v^\tau \rightharpoonup v \qquad \text{in } L^2 \quad \text{as } \tau\to 0.
$$
Therefore, as a consequence of distributional convergence,
$$
x’=v.
$$
To prove (1), fix a point $y\in \mathbb{R}^n$. By $v^\tau(t)\in -\partial F(x^\tau(t))$, we have
$$
F(y)\ge F(x^\tau(t)) - v^\tau(t)\cdot (y-x^\tau(t)) + \frac{\lambda}{2}|y-x^\tau(t)|^2.
$$
For all positive measurable function $a:[0,T]\to \mathbb{R}_+$,
$$
\int_0^T a(t)\left( F(y)-F(x^\tau(t))+v^\tau(t)\cdot (y-x^\tau(t))-\frac{\lambda}{2}|y-x^\tau(t)|^2 \right)\ dt \ge 0.
$$
Since $x^\tau\to x$ uniformly and hence $L^2$ strong, $v^\tau\rightharpoonup v$, and $F$ is lower-semicontinuous, by letting $\tau\to 0$, we get
$$
\int_0^T a(t)\left( F(y)-F(x(t))+v(t)\cdot (y-x(t))-\frac{\lambda}{2}|y-x(t)|^2 \right)\ dt \ge 0.
$$
From the arbitrariness of $a$, we get
$$
F(y)\ge F(x(t)) - v(t)\cdot (y-x(t)) + \frac{\lambda}{2}|y-x(t)|^2
\qquad \text{for a.e. } t.
$$
i.e. there exists a negligible set $N_y$ such that
$$
G_y(t):=F(y)-F(x(t))+v(t)\cdot (y-x(t))-\frac{\lambda}{2}|y-x(t)|^2 \ge 0
\qquad \text{for } t\notin N_y.
$$
Let $D$ be a countable dense set in the interior of $\operatorname{dom}F$, where $F$ is continuous, and
$$
N=\bigcup_{y\in D}N_y
$$
which is also negligible. For all $t\notin N$, and all $y\in \operatorname{dom}F$, there exists $y_m\in D$ such that $y_m\to y$ as $m\to\infty$ and then
$$
G_y(t)=\lim_{m\to\infty} G_{y_m}(t)\ge 0
\qquad \text{for all } t\notin N.
$$
Hence,
$$
v(t)\in -\partial F(x(t)).
$$
To prove (2), we have
$$
-\nabla F(x^\tau(t))=v^\tau(t)=(\tilde x^\tau)’(t).
$$
Since $x^\tau\to x$ uniformly and $F\in C^1$, we get, $v^\tau\rightharpoonup v$, we get
$$
v(t)=-\nabla F(x(t)) \qquad \text{a.e. } t.
$$
Since $t\mapsto -\nabla F(x(t))$ is uniformly continuous, the above equality holds for all $t$. $\square$
Remark. If the sequence $x_k^\tau$ is defined by
$$
x_{k+1}^\tau \in \operatorname*{argmin} _x \left\{ 2F\left(\frac{x+x _k^\tau}{2}\right)+\frac{|x-x _k^\tau |^2}{2\tau} \right\},
$$
then we have
$$
\frac{x_{k+1}^\tau-x_k^\tau}{\tau}
=
-\nabla F\left(\frac{x_{k+1}^\tau+x_k^\tau}{2}\right),
$$
and the convergence is of order $\tau^2$.
Reference
Santambrogio, F. {Euclidean, metric, and Wasserstein} gradient flows: an overview. Bull. Math. Sci. 7, 87–154 (2017).
The cover image in this article was taken in Hobbiton, New Zealand.
Gradient Flows in the Euclidean Space
