Reversible, Conductance and Rapid Mixing of Markov Chains

Reversible, Conductance and Rapid Mixing of Markov Chains

Introduction

Markov Chain

Let $(\Omega,\mathcal{F})$ be a measurable space, $P:\Omega\times \mathcal{F}\to [0,1]$ be a transition kernel, that is,

  • for every $x\in \Omega$, $P(x,\cdot)$ is a probability measure on $\mathcal{F}$;
  • for every $A\in \mathcal{F}$, $x\mapsto P(x,A)$ is measurable.

And we say the triple $(\Omega,\mathcal F,P)$ is a Markov scheme.

Fix an initial distribution $\mu_0$ on $(\Omega,\mathcal F)$. By the Kolmogorov consistency theorem, there exists a unique probability measure $\mathbb{P}$ on the product space $(\Omega^{\mathbb N_0},\mathcal F^{\otimes\mathbb N_0})$ such that for the coordinate process $\{X_k\}$,
$$
\mathbb P(X_0\in A_0,\dots,X_n\in A_n)
=
\int_{A_0}\mu_0(dx_0)\int_{A_1}P(x_0,dx_1)\cdots\int_{A_n}P(x_{n-1},dx_n),
\qquad A_0,\dots,A_n\in\mathcal F.
$$

Equivalently:

  • $X_0\sim\mu_0$;

  • for every $k\ge 0$ and $A\in\mathcal F$,

$$
\mathbb P(X_{k+1}\in A\mid X_0,\dots,X_k)
=
\mathbb P(X_{k+1}\in A\mid X_k)
\quad \text{a.s.} \qquad \text{(Markov property)}.
$$

In this case, the coordinate process $(X_k)_{k\ge 0}$ (and, more generally, any process with the same finite-dimensional
distributions) is called a Markov chain with initial distribution $\mu_0$ and transition kernel $P$.

Invariant Measure

Definition: A probability measure $\pi$ on $(\Omega,\mathcal F)$ is invariant (stationary) for $P$ if
$$
(\pi P)(A):=\int_\Omega \pi(dx)\ P(x,A) =\pi(A).
$$
Equivalently, if $X_0\sim\pi$, then $X_k\sim\pi$ for every $k\ge 1.$

Markov Operator

Consider the Hilbert space $L^2=L^2(\Omega,\mathcal{F},\pi)$ with the inner product
$$
\langle f,g\rangle_\pi:=\int_\Omega f\ g\ d\pi.
$$
Every Markov scheme defines a positive linear operator $M:L^2\to L^2$ by
$$
(Mf)(x) := \int_\Omega f(y)\ P(x, dy).
$$

Reversibility

Roughly speaking, a Markov scheme is reversible if for any two sets $A,B\in \mathcal{F}$, it steps from $A$ to $B$ as often as from $B$ to $A$.

Definition: A Markov scheme is reversible with respect to probability measure $\pi$ if
$$
\pi(dx)P(x,dy) = \pi(dy)P(y,dx),\quad x,y\in \Omega
$$
Lemma 1: The followings are equivalent:

(1) Markov scheme is reversible with respect to probability measure $\pi$;

(2) for any two sets $A,B\in \mathcal{F}$,
$$
\int_B \pi(dx)P(x,A) = \int_A\pi(dx)P(x,B);
$$
(3) for any measurable function $F:\Omega\times \Omega \to\mathbb{R}$ that makes the integral below finite, we have
$$
\int_\Omega\int_\Omega F(x,y)\ \pi(dx)P(x,dy) = \int_\Omega\int_\Omega F(y,x)\ \pi(dx)P(x,dy);
$$
(4) the Markov operator $M$ is self-adjoint.

Proof: $(1)\Longrightarrow (2):$ for any two sets $A,B\in \mathcal{F}$,
$$
\int_B \pi(dx)P(x,A) = \int_B \pi(dx)\int_A P(x,dy)\xlongequal{\text{Fubini}} \int_A\int_B \pi(dx)P(x,dy)\xlongequal{(1)} \int_A\int_B \pi(dy)P(y,dx)\xlongequal{\text{Fubini}}\int_A\pi(dy)\int_B P(y,dx)=\int_A\pi(dy)P(y,B)= \int_A\pi(dx)P(x,B);
$$
$(2)\Longrightarrow (3):$ it is easy to show that (3) holds for indicator functions $F(x,y)=1_A(x)1_B(y)$ and then (3) holds for any bounded measurable function $F:\Omega\times \Omega \to\mathbb{R}$ by just using the Monotone Class Theorem.

$(3)\Longrightarrow (4):$ For $f,g$ in $L^2$,
$$
\langle f,Mg\rangle_\pi=\int_\Omega f(x)\ (Mg)(x)\ \pi(dx) = \int_\Omega f(x)\int_\Omega g(y)\ P(x, dy)\ \pi(dx) \xlongequal{\text{Fubini}}\int_\Omega\int_\Omega f(x)g(y)\ \pi(dx)P(x,dy)
$$

$$
\langle Mf,g\rangle_\pi= \int_\Omega (Mf)(x)\ g(x)\ \pi(dx)= \int_\Omega g(x)\int_\Omega f(y)\ P(x, dy)\ \pi(dx)\xlongequal{\text{Fubini}}\int_\Omega\int_\Omega f(y)g(x)\ \pi(dx)P(x,dy)
$$

By using (3) with $F(x,y)=f(x)g(y)$, we obtain $\langle f,Mg\rangle_\pi=\langle Mf,g\rangle_\pi$ directly.

$(4)\Longrightarrow (1):$ Let $f=1_A$ and $g=1_B$, the desired result follows directly by the above equations.

Lemma 2: If the Markov scheme is reversible with respect to probability measure $\pi$, then for any $f\in L^2$,
$$
|\langle f,Mf\rangle_\pi|\le \langle f,f\rangle_\pi,
$$
and the equation holds when $f$ is a constant function. Thus the spectral radius of $M$ is exactly $1$.

Proof: For any $f\in L^2$, applying Lemma 1(3) with $F(x,y)=f(x)$, it is easy to show that
$$
\langle f,f\rangle_\pi\ \pm\ \langle f,Mf\rangle_\pi=\frac{1}{2}\int_\Omega\int_\Omega (f(x)\pm f(y))^2\ \pi(dy)P(y,dx)\ge 0.
$$
And then the desired result follows.

Lemma 3: If the Markov scheme is reversible with respect to probability measure $\pi$, then $\pi$ is an invariant measure of the Markov scheme.

Proof: For all $A\in\mathcal{F}$,
$$
(\pi P)(A)=\int_\Omega \pi(dx)\ P(x,A)=\int_\Omega \pi(dx)\ \int_A P(x,dy)\xlongequal{\text{Fubini}}\int_\Omega \int_\Omega 1_A(y)\pi(dx)P(x,dy)\xlongequal{\text{Lemma 1(3)}}\int_\Omega \int_\Omega 1_A(x)\pi(dx)P(x,dy)=\pi(A).
$$

Laziness of Markov Chains

Definition: We call the Markov scheme lazy if
$$
P(x,\{x\})\ge \frac12\qquad\text{for all }x\in\Omega.
$$
(Informally: with probability at least $1/2$ the chain stays put.)

Lemma 4: If the Markov scheme is lazy and reversible, then $M$ is positive semi-definite.

Proof: Consider the operator $2M-I$, which corresponds to the kernel
$$
\widetilde P(x,A):=2P(x,A)-\mathbf 1_A(x).
$$
Laziness ensures $\widetilde P(x,\cdot)$ is still a probability measure (nonnegativity at $\{x\}$ is exactly $2P(x,\{x\})-1\ge 0$), hence $2M-I$ is again a Markov operator and is self-adjoint. And then by Lemma 2,
$$
\langle f,Mf\rangle_\pi
=
\frac12\langle f,f\rangle_\pi
+
\frac12\langle f,(2M-I)f\rangle_\pi
\ge \frac12\langle f,f\rangle_\pi-\frac12\langle f,f\rangle_\pi=0.
$$
And then the desired result follows.

Note: Every Markov scheme can be made lazy by simply tossing a coin at each step and making a move only if it is tails. So (at the cost of a little slow-down) we can assume that $M$ is positive semi-definite, which will be very convenient.

Ergodic Flow

Definition: ** We define the **ergodic flow $\Phi: \mathcal{F}\to[0,1]$ of the Markov scheme by
$$
\Phi(A) = \int_A P(x,A^c)\pi(dx),
$$
which is the probability of the event that in choosing $X_0$ from the invariant distribution we have $X_0\in A$ but $X_1\notin A$.

Lemma 5: For all $A\in \mathcal{F}$, we have $\Phi(A)=\Phi(A^c)$.

*Proof: * For all $A\in \mathcal{F}$, from the assumption that $\pi$ is invariant, we get
$$
\Phi(A)-\Phi(A^c) = \int_{A} P(x,A^c)\pi(dx)-\int_{A^c} P(x,A)\pi(dx)=\int_{A} (1-P(x,A))\pi(dx)-\int_{A^c} P(x,A)\pi(dx)=\pi(A)-\int_\Omega P(x,A)\pi(dx) = 0.
$$
And then the desired result follows.

Note that the computation also works backward, we have the following lemma.

Lemma 6: If $\nu$ is any probability measure on $(\Omega,\mathcal{F})$ such that the set-function
$$
\Psi(A) =\int_A P(x,A^c)\nu(dx)
$$
is invariant under complementation, namely, $\Psi(A)=\Psi(A^c)$, then $\nu$ is invariant.

*Proof: * For all $A\in \mathcal{F}$, from the assumption that $\Psi(A)=\Psi(A^c)$, we have
$$
0=\Psi(A)-\Psi(A^c)=\int_{A} P(x,A^c)\nu(dx)-\int_{A^c} P(x,A)\nu(dx)=\int_{A} (1-P(x,A))\nu(dx)-\int_{A^c} P(x,A)\nu(dx)=\nu(A)-\int_\Omega P(x,A)\nu(dx),
$$
which means $\nu P=\nu$, $\nu$ is invariant.

Conductance

Definition: *The *conductance of the Markov scheme is
$$
\Phi:=\inf_{0<\pi(A)<1/2} \frac{\Phi(A)}{\pi(A)}.
$$
For every $0\le s\le 1$, the $s-$conductance is defined by
$$
\Phi_s:=\inf_{s<\pi(A)<1/2} \frac{\Phi(A)}{\pi(A)-s}.
$$
And we call the value $1-P(x,\{x\})$ the local conductance of the Markov scheme at element $x$.

Lemma 7: If the invariant measure $\pi$ has an atom $y$ (i.e. $\pi(\{y\})>0$), then the local conductance of the Markov scheme at element $y$ is just $\Phi(\{y\})/\pi(\{y\})$, hence
$$
\Phi\le \frac{\Phi(\{y\})}{\pi(\{y\})} =1-P(y,\{y\}).
$$
*Proof: *
$$
\Phi(\{y\})= \int_{\{y\}} P(x,\{y\}^c)\pi(dx)=P(\{y\},\{y\}^c)\pi(\{y\})= (1-P(y,\{y\}))\pi(\{y\})
$$
Since $\pi(\{y\})>0$, we get the desired result.

Lemma 8: Let
$$
H_t=\{x\in\Omega; 1-P(x,\{x\})<t\},
$$
and $s=\pi(H_t)$. Then
$$
\Phi(H_t)<t\ \pi(H_t).
$$
As a consequence, if $s<1/2$, then the $(s/2)-$conductance of the Markov scheme is at most $2t$.
*Proof: *
$$
\Phi(H_t)=\int_{H_t} P(x,H_t^c)\pi(dx)<\int_{H_t} P(x,\{x\}^c)\pi(dx)<t\ \pi(H_t)=ts.
$$
If $s<1/2$, then
$$
\Phi_{s/2}= \inf_{s/2<\pi(A)<1/2} \frac{\Phi(A)}{\pi(A)-s/2}\le \frac{\Phi(H_t)}{\pi(H_t)-s/2}< \frac{ts}{s-s/2}=2t.
$$

Mixing Time

Let $\mu_k$ denotes the law of $X_k$. By definition, we have the recurrence
$$
\mu_k(A)=\int_\Omega \mu_{k-1}(dx)P(x,A)
$$
The reason for us to introduce the definition of conductance is that if the conductance $\Phi>0$, then
$$
\operatorname{TV}(\mu_k,\pi):=\sup_{A\in\mathcal{F}}|\mu_k(A)-\pi(A)|\to 0 \quad \text{as }k\to \infty.
$$
We have the following theoerms.

Theorem 1: Let
$$
M=\sup_{A\in \mathcal{F}} \frac{\mu_0(A)}{\pi(A)}\tag{warm-start}.
$$
Then
$$
\operatorname{TV}(\mu_k,\pi):=\sup_{A\in\mathcal{F}}|\mu_k(A)-\pi(A)|\le \sqrt{M}\left(1-\frac{1}{2}\Phi^2\right)^k
$$
Theorem 2: Let $0<s\le 1/2$ and $H_s=\sup\{|\mu_0(A)-\pi(A)|; \pi(A)\le s\}$. Assume that each atom has $\pi-$measure less than $1/2$, then
$$
\operatorname{TV}(\mu_k,\pi):=\sup_{A\in\mathcal{F}}|\mu_k(A)-\pi(A)|\le 2H_s+\frac{2H_s}{s}\left(1-\frac{1}{2}\Phi_s^2\right)^k
$$
Furthermore, if $\pi$ is atom-free, then
$$
\operatorname{TV}(\mu_k,\pi):=\sup_{A\in\mathcal{F}}|\mu_k(A)-\pi(A)|\le H_s+\frac{H_s}{s}\left(1-\frac{1}{2}\Phi_s^2\right)^k
$$

See [1] for the proofs of Theorems 1 and 2.

Definition: The mixing time in total variation distance is defined by
$$
t_{\text{TV}}(\varepsilon,\mu_0):=\inf\{k\in \mathbb{N}| \operatorname{TV}(\mu_k,\pi)\le \varepsilon\}
$$

Relation to Sampling Algorithms

In non-asymptotic analysis of the convergence of sampling algorithms, mixing time plays a central role.

In much of the literature on sampling algorithms, it is common to assume warm start for the initial distribution.

Definition: ** We say that the initial distribution $\mu_0$ is **$M-$warm, if it satisfies
$$
\sup_{A\in \mathcal{F}} \frac{\mu_0(A)}{\pi(A)}\le M
$$
Under $M-$warm assumption, and if target distribution is atom-free, then $H_s$ defined in Theorem 2 above satisfies
$$
H_s=\sup\{|\mu_0(A)-\pi(A)|; \pi(A)\le s\}\le \min\{s,(M-1)s\}\le Ms.
$$
Then by Theorem 2, we get
$$
\operatorname{TV}(\mu_k,\pi)\le Ms+ M\left(1-\frac{1}{2}\Phi_s^2\right)^k\le Ms+ Me^{-\frac{n}{2}\Phi_s^2}.
$$
Then $\operatorname{TV}(\mu_k,\pi)\le \varepsilon$ follows from taking
$$
s=\frac{\varepsilon}{2M},\quad k\ge \frac{2}{\Phi_s^2}\log\frac{2M}{\varepsilon}.
$$
Hence, the problem of analyzing the mixing time is reduced to controlling the $s-$conductance $\Phi_s$.

Reference

[1] Lovász, László, and Miklós Simonovits. “Random walks in a convex body and an improved volume algorithm.” Random structures & algorithms 4.4 (1993): 359-412.

The cover image in this article was taken at Glenorchy, New Zealand.

Reversible, Conductance and Rapid Mixing of Markov Chains

https://handsteinwang.github.io/2025/12/18/Markov_Chain_1/

Author

Handstein Wang

Posted on

2025-12-18

Updated on

2025-12-18

Licensed under