Posted 2025-12-18Sampling

Reversible, Conductance and Rapid Mixing of Markov Chains

Introduction

Markov Chain

Let $(\Omega,\mathcal{F})$ be a measurable space, $P:\Omega\times \mathcal{F}\to [0,1]$ be a transition kernel, that is,

for every $x\in \Omega$, $P(x,\cdot)$ is a probability measure on $\mathcal{F}$;
for every $A\in \mathcal{F}$, $x\mapsto P(x,A)$ is measurable.

And we say the triple $(\Omega,\mathcal F,P)$ is a Markov scheme.

Fix an initial distribution $\mu_0$ on $(\Omega,\mathcal F)$. By the Kolmogorov consistency theorem, there exists a unique probability measure $\mathbb{P}$ on the product space $(\Omega^{\mathbb N_0},\mathcal F^{\otimes\mathbb N_0})$ such that for the coordinate process $\{X_k\}$,
$$
\mathbb P(X_0\in A_0,\dots,X_n\in A_n)
=
\int_{A_0}\mu_0(dx_0)\int_{A_1}P(x_0,dx_1)\cdots\int_{A_n}P(x_{n-1},dx_n),
\qquad A_0,\dots,A_n\in\mathcal F.
$$

Equivalently:

$X_0\sim\mu_0$;
for every $k\ge 0$ and $A\in\mathcal F$,

$$
\mathbb P(X_{k+1}\in A\mid X_0,\dots,X_k)
=
\mathbb P(X_{k+1}\in A\mid X_k)
\quad \text{a.s.} \qquad \text{(Markov property)}.
$$

In this case, the coordinate process $(X_k)_{k\ge 0}$ (and, more generally, any process with the same finite-dimensional
distributions) is called a Markov chain with initial distribution $\mu_0$ and transition kernel $P$.

Invariant Measure

Definition: A probability measure $\pi$ on $(\Omega,\mathcal F)$ is invariant (stationary) for $P$ if
$$
(\pi P)(A):=\int_\Omega \pi(dx)\ P(x,A) =\pi(A).
$$
Equivalently, if $X_0\sim\pi$, then $X_k\sim\pi$ for every $k\ge 1.$

Markov Operator

Consider the Hilbert space $L^2=L^2(\Omega,\mathcal{F},\pi)$ with the inner product
$$
\langle f,g\rangle_\pi:=\int_\Omega f\ g\ d\pi.
$$
Every Markov scheme defines a positive linear operator $M:L^2\to L^2$ by
$$
(Mf)(x) := \int_\Omega f(y)\ P(x, dy).
$$

Reversibility

Roughly speaking, a Markov scheme is reversible if for any two sets $A,B\in \mathcal{F}$, it steps from $A$ to $B$ as often as from $B$ to $A$.

Definition: A Markov scheme is reversible with respect to probability measure $\pi$ if
$$
\pi(dx)P(x,dy) = \pi(dy)P(y,dx),\quad x,y\in \Omega
$$
Lemma 1: The followings are equivalent:

(1) Markov scheme is reversible with respect to probability measure $\pi$;

(2) for any two sets $A,B\in \mathcal{F}$,
$$
\int_B \pi(dx)P(x,A) = \int_A\pi(dx)P(x,B);
$$
(3) for any measurable function $F:\Omega\times \Omega \to\mathbb{R}$ that makes the integral below finite, we have
$$
\int_\Omega\int_\Omega F(x,y)\ \pi(dx)P(x,dy) = \int_\Omega\int_\Omega F(y,x)\ \pi(dx)P(x,dy);
$$
(4) the Markov operator $M$ is self-adjoint.

Proof: $(1)\Longrightarrow (2):$ for any two sets $A,B\in \mathcal{F}$,
$$
\int_B \pi(dx)P(x,A) = \int_B \pi(dx)\int_A P(x,dy)\xlongequal{\text{Fubini}} \int_A\int_B \pi(dx)P(x,dy)\xlongequal{(1)} \int_A\int_B \pi(dy)P(y,dx)\xlongequal{\text{Fubini}}\int_A\pi(dy)\int_B P(y,dx)=\int_A\pi(dy)P(y,B)= \int_A\pi(dx)P(x,B);
$$
$(2)\Longrightarrow (3):$ it is easy to show that (3) holds for indicator functions $F(x,y)=1_A(x)1_B(y)$ and then (3) holds for any bounded measurable function $F:\Omega\times \Omega \to\mathbb{R}$ by just using the Monotone Class Theorem.

$(3)\Longrightarrow (4):$ For $f,g$ in $L^2$,
$$
\langle f,Mg\rangle_\pi=\int_\Omega f(x)\ (Mg)(x)\ \pi(dx) = \int_\Omega f(x)\int_\Omega g(y)\ P(x, dy)\ \pi(dx) \xlongequal{\text{Fubini}}\int_\Omega\int_\Omega f(x)g(y)\ \pi(dx)P(x,dy)
$$

$$
\langle Mf,g\rangle_\pi= \int_\Omega (Mf)(x)\ g(x)\ \pi(dx)= \int_\Omega g(x)\int_\Omega f(y)\ P(x, dy)\ \pi(dx)\xlongequal{\text{Fubini}}\int_\Omega\int_\Omega f(y)g(x)\ \pi(dx)P(x,dy)
$$

By using (3) with $F(x,y)=f(x)g(y)$, we obtain $\langle f,Mg\rangle_\pi=\langle Mf,g\rangle_\pi$ directly.

$(4)\Longrightarrow (1):$ Let $f=1_A$ and $g=1_B$, the desired result follows directly by the above equations.

Lemma 2: If the Markov scheme is reversible with respect to probability measure $\pi$, then for any $f\in L^2$,
$$
|\langle f,Mf\rangle_\pi|\le \langle f,f\rangle_\pi,
$$
and the equation holds when $f$ is a constant function. Thus the spectral radius of $M$ is exactly $1$.

Proof: For any $f\in L^2$, applying Lemma 1(3) with $F(x,y)=f(x)$, it is easy to show that
$$
\langle f,f\rangle_\pi\ \pm\ \langle f,Mf\rangle_\pi=\frac{1}{2}\int_\Omega\int_\Omega (f(x)\pm f(y))^2\ \pi(dy)P(y,dx)\ge 0.
$$
And then the desired result follows.

Lemma 3: If the Markov scheme is reversible with respect to probability measure $\pi$, then $\pi$ is an invariant measure of the Markov scheme.

Proof: For all $A\in\mathcal{F}$,
$$
(\pi P)(A)=\int_\Omega \pi(dx)\ P(x,A)=\int_\Omega \pi(dx)\ \int_A P(x,dy)\xlongequal{\text{Fubini}}\int_\Omega \int_\Omega 1_A(y)\pi(dx)P(x,dy)\xlongequal{\text{Lemma 1(3)}}\int_\Omega \int_\Omega 1_A(x)\pi(dx)P(x,dy)=\pi(A).
$$

Laziness of Markov Chains

Definition: We call the Markov scheme lazy if
$$
P(x,\{x\})\ge \frac12\qquad\text{for all }x\in\Omega.
$$
(Informally: with probability at least $1/2$ the chain stays put.)

Lemma 4: If the Markov scheme is lazy and reversible, then $M$ is positive semi-definite.

Proof: Consider the operator $2M-I$, which corresponds to the kernel
$$
\widetilde P(x,A):=2P(x,A)-\mathbf 1_A(x).
$$
Laziness ensures $\widetilde P(x,\cdot)$ is still a probability measure (nonnegativity at $\{x\}$ is exactly $2P(x,\{x\})-1\ge 0$), hence $2M-I$ is again a Markov operator and is self-adjoint. And then by Lemma 2,
$$
\langle f,Mf\rangle_\pi
=
\frac12\langle f,f\rangle_\pi
+
\frac12\langle f,(2M-I)f\rangle_\pi
\ge \frac12\langle f,f\rangle_\pi-\frac12\langle f,f\rangle_\pi=0.
$$
And then the desired result follows.

Note: Every Markov scheme can be made lazy by simply tossing a coin at each step and making a move only if it is tails. So (at the cost of a little slow-down) we can assume that $M$ is positive semi-definite, which will be very convenient.

Ergodic Flow

Definition: ** We define the **ergodic flow $\Phi: \mathcal{F}\to[0,1]$ of the Markov scheme by
$$
\Phi(A) = \int_A P(x,A^c)\pi(dx),
$$
which is the probability of the event that in choosing $X_0$ from the invariant distribution we have $X_0\in A$ but $X_1\notin A$.

Lemma 5: For all $A\in \mathcal{F}$, we have $\Phi(A)=\Phi(A^c)$.

*Proof: * For all $A\in \mathcal{F}$, from the assumption that $\pi$ is invariant, we get
$$
\Phi(A)-\Phi(A^c) = \int_{A} P(x,A^c)\pi(dx)-\int_{A^c} P(x,A)\pi(dx)=\int_{A} (1-P(x,A))\pi(dx)-\int_{A^c} P(x,A)\pi(dx)=\pi(A)-\int_\Omega P(x,A)\pi(dx) = 0.
$$
And then the desired result follows.

Note that the computation also works backward, we have the following lemma.

Lemma 6: If $\nu$ is any probability measure on $(\Omega,\mathcal{F})$ such that the set-function
$$
\Psi(A) =\int_A P(x,A^c)\nu(dx)
$$
is invariant under complementation, namely, $\Psi(A)=\Psi(A^c)$, then $\nu$ is invariant.

*Proof: * For all $A\in \mathcal{F}$, from the assumption that $\Psi(A)=\Psi(A^c)$, we have
$$
0=\Psi(A)-\Psi(A^c)=\int_{A} P(x,A^c)\nu(dx)-\int_{A^c} P(x,A)\nu(dx)=\int_{A} (1-P(x,A))\nu(dx)-\int_{A^c} P(x,A)\nu(dx)=\nu(A)-\int_\Omega P(x,A)\nu(dx),
$$
which means $\nu P=\nu$, $\nu$ is invariant.

Conductance

Definition: *The *conductance of the Markov scheme is
$$
\Phi:=\inf_{0<\pi(A)<1/2} \frac{\Phi(A)}{\pi(A)}.
$$
For every $0\le s\le 1$, the $s-$conductance is defined by
$$
\Phi_s:=\inf_{s<\pi(A)<1/2} \frac{\Phi(A)}{\pi(A)-s}.
$$
And we call the value $1-P(x,\{x\})$ the local conductance of the Markov scheme at element $x$.

Lemma 7: If the invariant measure $\pi$ has an atom $y$ (i.e. $\pi(\{y\})>0$), then the local conductance of the Markov scheme at element $y$ is just $\Phi(\{y\})/\pi(\{y\})$, hence
$$
\Phi\le \frac{\Phi(\{y\})}{\pi(\{y\})} =1-P(y,\{y\}).
$$
*Proof: *
$$
\Phi(\{y\})= \int_{\{y\}} P(x,\{y\}^c)\pi(dx)=P(\{y\},\{y\}^c)\pi(\{y\})= (1-P(y,\{y\}))\pi(\{y\})
$$
Since $\pi(\{y\})>0$, we get the desired result.

Lemma 8: Let
$$
H_t=\{x\in\Omega; 1-P(x,\{x\})<t\},
$$
and $s=\pi(H_t)$. Then
$$
\Phi(H_t)<t\ \pi(H_t).
$$
As a consequence, if $s<1/2$, then the $(s/2)-$conductance of the Markov scheme is at most $2t$.
*Proof: *
$$
\Phi(H_t)=\int_{H_t} P(x,H_t^c)\pi(dx)<\int_{H_t} P(x,\{x\}^c)\pi(dx)<t\ \pi(H_t)=ts.
$$
If $s<1/2$, then
$$
\Phi_{s/2}= \inf_{s/2<\pi(A)<1/2} \frac{\Phi(A)}{\pi(A)-s/2}\le \frac{\Phi(H_t)}{\pi(H_t)-s/2}< \frac{ts}{s-s/2}=2t.
$$

Mixing Time

Let $\mu_k$ denotes the law of $X_k$. By definition, we have the recurrence
$$
\mu_k(A)=\int_\Omega \mu_{k-1}(dx)P(x,A)
$$
The reason for us to introduce the definition of conductance is that if the conductance $\Phi>0$, then
$$
\operatorname{TV}(\mu_k,\pi):=\sup_{A\in\mathcal{F}}|\mu_k(A)-\pi(A)|\to 0 \quad \text{as }k\to \infty.
$$
We have the following theoerms.

Theorem 1: Let
$$
M=\sup_{A\in \mathcal{F}} \frac{\mu_0(A)}{\pi(A)}\tag{warm-start}.
$$
Then
$$
\operatorname{TV}(\mu_k,\pi):=\sup_{A\in\mathcal{F}}|\mu_k(A)-\pi(A)|\le \sqrt{M}\left(1-\frac{1}{2}\Phi^2\right)^k
$$
Theorem 2: Let $0<s\le 1/2$ and $H_s=\sup\{|\mu_0(A)-\pi(A)|; \pi(A)\le s\}$. Assume that each atom has $\pi-$measure less than $1/2$, then
$$
\operatorname{TV}(\mu_k,\pi):=\sup_{A\in\mathcal{F}}|\mu_k(A)-\pi(A)|\le 2H_s+\frac{2H_s}{s}\left(1-\frac{1}{2}\Phi_s^2\right)^k
$$
Furthermore, if $\pi$ is atom-free, then
$$
\operatorname{TV}(\mu_k,\pi):=\sup_{A\in\mathcal{F}}|\mu_k(A)-\pi(A)|\le H_s+\frac{H_s}{s}\left(1-\frac{1}{2}\Phi_s^2\right)^k
$$

See [1] for the proofs of Theorems 1 and 2.

Definition: The mixing time in total variation distance is defined by
$$
t_{\text{TV}}(\varepsilon,\mu_0):=\inf\{k\in \mathbb{N}| \operatorname{TV}(\mu_k,\pi)\le \varepsilon\}
$$

Relation to Sampling Algorithms

In non-asymptotic analysis of the convergence of sampling algorithms, mixing time plays a central role.

In much of the literature on sampling algorithms, it is common to assume warm start for the initial distribution.

Definition: ** We say that the initial distribution $\mu_0$ is **$M-$warm, if it satisfies
$$
\sup_{A\in \mathcal{F}} \frac{\mu_0(A)}{\pi(A)}\le M
$$
Under $M-$warm assumption, and if target distribution is atom-free, then $H_s$ defined in Theorem 2 above satisfies
$$
H_s=\sup\{|\mu_0(A)-\pi(A)|; \pi(A)\le s\}\le \min\{s,(M-1)s\}\le Ms.
$$
Then by Theorem 2, we get
$$
\operatorname{TV}(\mu_k,\pi)\le Ms+ M\left(1-\frac{1}{2}\Phi_s^2\right)^k\le Ms+ Me^{-\frac{n}{2}\Phi_s^2}.
$$
Then $\operatorname{TV}(\mu_k,\pi)\le \varepsilon$ follows from taking
$$
s=\frac{\varepsilon}{2M},\quad k\ge \frac{2}{\Phi_s^2}\log\frac{2M}{\varepsilon}.
$$
Hence, the problem of analyzing the mixing time is reduced to controlling the $s-$conductance $\Phi_s$.

Reference

[1] Lovász, László, and Miklós Simonovits. “Random walks in a convex body and an improved volume algorithm.” Random structures & algorithms 4.4 (1993): 359-412.

The cover image in this article was taken at Glenorchy, New Zealand.

Reversible, Conductance and Rapid Mixing of Markov Chains

https://handsteinwang.github.io/2025/12/18/Markov_Chain_1/

Author

Handstein Wang

Posted on

2025-12-18

Updated on

2025-12-18

Licensed under

#Sampling Stochastic Process

Reversible, Conductance and Rapid Mixing of Markov Chains

Introduction

Markov Chain

Invariant Measure

Markov Operator

Reversibility

Laziness of Markov Chains

Ergodic Flow

Conductance

Mixing Time

Relation to Sampling Algorithms

Reference

Author

Posted on

Updated on

Licensed under

Links

Categories

Recents

Archives

Tags

follow.it