Well-posedness of the Mean Field PDE and Particle System for Stein Variational Gradient Descent

Well-posedness of the Mean Field PDE and Particle System for Stein Variational Gradient Descent

We consider the following interacting particle system in $\mathbb{R}^d$:

$$
\begin{aligned}
\dot{x}_i(t) &= -\frac{1}{N}\sum _{j=1}^{N}\nabla K(x_i(t)-x_j(t)) -\frac{1}{N} \sum _{j=1} ^{N}K\big(x_i(t)-x_j(t)\bigr)\nabla V (x_j(t)), \\\
x_i(0) &= x_i ^0 \in \mathbb R^d, \qquad i=1,\cdots,N.
\end{aligned} \tag{1}
$$

We refer to each of the $N$ functions $x_i(\cdot) \in \mathbb{R}^d$ as a particle. The function $K : \mathbb{R}^d \mapsto \mathbb{R}$ is a smooth, symmetric, and positive definite kernel. The function $V : \mathbb{R}^d \to \mathbb{R}$ is a smooth potential such that $e^{-V(x)}$ is integrable. More specific assumptions about $K$ and $V$ are given below.

We are interested in the macroscopic behavior of the particle system $(1)$ as $N \to \infty$ in the framework of mean field limit. Formally this mean field limit is described by the following non-local, nonlinear partial differential equation (PDE):
$$
\begin{aligned}
&\partial_t \rho = \nabla \cdot \left(\rho\left(K \ast (\nabla \rho+\nabla V\rho)\right)\right), \\\
&\rho(0,\cdot) = \rho_0(\cdot).
\end{aligned} \tag{2}
$$

In this article, we will prove the well-posedness of (1) and (2).

Read more
Introduction to Stein Variational Gradient Descent

Introduction to Stein Variational Gradient Descent

The Stein Variational Gradient Descent (SVGD) algorithm was first introduced by Liu and Wang [LW16], whose idea is to transport a set of $N$ particles $\{x_i\}_{i=1}^N$ in $\mathbb R^d$ so that their empirical measure

$$
\mu^N:=\frac{1}{N}\sum_{i=1}^N\delta_{x_i}
$$

approximates the target probability measure

$$
\rho_\infty (x)\ dx=Z^{-1} e^{-V(x)}\ dx
$$

with an unknown normalization factor $Z$.

Read more
An Introduction to Mean-Field Langevin Dynamics

An Introduction to Mean-Field Langevin Dynamics

Optimization over the space of probability measures is not only widely applicable, but also offers a useful perspective for analyzing certain complicated finite-dimensional nonconvex optimization problems. In particular, lifting such problems to optimization problems over probability measures can lead to better structural properties, such as convexity. Mean-field Langevin dynamics provides a representative example of this idea. Its central motivation is that some highly nonconvex optimization problems arising in neural network training become better behaved when reformulated as the optimization of a functional on the space of probability measures. This viewpoint also makes it possible to build a theoretical foundation for understanding the convergence of SGD. In what follows, we briefly introduce this perspective, mainly based on the paper by [Hu, Kaitong, et al]. The main analytical framework of this theory can be illustrated by Figure 2.

Read more
Optimization over the Space of Probability Measures