Properties of the Relative Entropy
In this article, we will introduce some important properties of the relative entropy, including lower semi-continuity, convexity, compactness of sublevel sets based on the Donsker-Varadhan variational formula.
Properties of the Relative Entropy
In this article, we will introduce some important properties of the relative entropy, including lower semi-continuity, convexity, compactness of sublevel sets based on the Donsker-Varadhan variational formula.
Theorem 1 (Lusin’s theorem). Suppose that $X$ and $Y$ are Polish spaces, that $\mu$ is a finite Borel measure on $X$, that $f:X\to Y$ is Borel measurable, and that $\varepsilon>0$. Then there exists a compact subset $K$ of $X$, with
$$
\mu(X\setminus K)<\varepsilon,
$$
such that the restriction of $f$ to $K$ is continuous.
Definition 1. Suppose that $f$ is a function on the Borel subsets of a metric space $X$ taking values in $[0,\infty]$. We say $f$ is locally finite if for each $x \in X$ there exists a neighborhood $N$ of $x$ with $f(N) < \infty$.
Definition 1. A mapping $f$ from the Borel sets of a metrizable space $(X,\tau)$ to $[0,\infty]$ is tight if $f(K)<\infty$ for each compact $K$ in $X$ and
$$
f(A)=\sup\{f(K):K \text{ compact},\ K\subseteq A\},
\quad \text{for each } A\in \mathcal{B}(X).
$$
Borel Measure, Support and Regularity Property
Definition 1. If $(X,\tau)$ is a topological space, then the Borel $\sigma$-field $\mathcal{B}$ of $X$ is the $\sigma$-field generated by the open sets of $X$ and we say a measure $\mu$ defined on $(X,\mathcal{B})$ is a Borel measure.
Reversible, Conductance and Rapid Mixing of Markov Chains
Let $(\Omega,\mathcal{F})$ be a measurable space, $P:\Omega\times \mathcal{F}\to [0,1]$ be a transition kernel, that is,
The Wasserstein distance has wide applications in problems such as optimal transport. It characterizes the distance between two probability measures. In German, Wasser means water, and Stein means stone (although I have not carefully verified whether Wasserstein is a person’s name; after all, many German names are of this kind, for example: Einstein = one stone). This article will mainly answer the following two questions:
In this article, we only consider the Wasserstein distance on the $d$-dimensional Euclidean space $\mathbb{R}^d$. Since $\mathbb{R}^d$ is a special Polish space, the conclusions of this article can also be extended to general Polish spaces.
Kolmogorov Consistency Theorem
How can one define a product probability measure on infinitely many probability spaces? That is, given a family of probability spaces $(\Omega_t,\mathcal F_t)$, $t\in T$, suppose that for every nonempty finite subset $S\subset T$, a probability measure $\mathbb P_S$ has already been defined on $\prod_{t\in S}\mathcal F_t$. Then how can we define a “suitable” probability measure $\mathbb P$ on the product measurable space
$$
\left(\prod_{t\in T}\Omega_t,\prod_{t\in T}\mathcal F_t\right)
$$
such that its marginal distribution measures are exactly $\mathbb P_S$?
Monotone Class Theorem for Sets
Unless otherwise specified, all classes of sets in this article refer to classes of subsets of $\Omega$.
Probability Measure on the Path Space
Definition 1: Let $(\Omega,\mathscr{F},\mathbb{P})$ be a probability space, and let $(X_t)_{t\in T}$ be a family of random variables on $(\Omega,\mathscr{F},\mathbb{P})$, that is, for every $t\in T$, we have
$$
X_t: (\Omega,\mathscr{F})\to (\mathbb{R},\mathcal{B}(\mathbb{R}))
$$
measurable. Then
$$
\begin{aligned}
X:T\times\Omega&\to \mathbb{R}\\\
(t,\omega)&\mapsto X_t(\omega)
\end{aligned}
$$
is called a stochastic process. For fixed $\omega\in \Omega$, we call
$$
\begin{aligned}
X_\omega:T&\to \mathbb{R}\\\
t&\mapsto X(t,\omega)
\end{aligned}
$$
a path, and call the collection of all paths
$$
\mathbb{D} := \{X_\omega\ ;\ \omega\in \Omega\}
$$
the path space.
Remark: Each element $X_\omega$ of the path space is a mapping from $T$ to $\mathbb{R}$.