Krylov Subspace and GMRES Algorithm
In recent years, many fields have involved the problem of solving large-scale sparse systems of equations. When the problem size is extremely large, solving them becomes very difficult. Fortunately, Krylov subspace methods can handle such problems very well, and thus they have also been rated as one of the ten greatest algorithms of the 20th century. In this article, we will introduce the basic idea of the Krylov subspace and the famous GMRES algorithm (Generalized Minimal Residual Method).
In this article, we will mainly consider the following linear system problem:
$$
Ax = b,
$$
where $A\in \mathbb{R}^{n\times n},\ b\in \mathbb{R}^n$, and $A$ is invertible, with $n$ being large (for example, $n\approx 10^8$).
The main idea of the GMRES method is to transform the above $n$-dimensional linear system problem into a relatively small $m$-dimensional least-squares problem through the Krylov subspace. To this end, we first introduce the basic idea of the Krylov subspace.
Krylov Subspace
Before introducing the Krylov subspace, let us first recall the Cayley-Hamilton theorem and the definition of the minimal polynomial.
Theorem 1 (Cayley-Hamilton Theorem) Let $f(x)$ be the characteristic polynomial of the matrix $A\in \mathbb{R}^{n\times n}$. Then $f(A)=O$.
The Cayley-Hamilton theorem tells us that any square matrix satisfies its own characteristic polynomial.
Definition 1 (Minimal Polynomial) We call the monic polynomial $\mu(x)$ of smallest degree satisfied by the matrix $A\in \mathbb{R}^{n\times n}$ the minimal polynomial of $A$.
If you have studied linear algebra, you will certainly know the following two results:
Proposition 1: Let $f(x)$ be the characteristic polynomial of the matrix $A\in \mathbb{R}^{n\times n}$, and let $\mu(x)$ be the minimal polynomial of $A$. Then $\mu(x)\mid f(x)$, and hence $\operatorname{deg}\mu(x)\le n$.
Proposition 2: Let $\mu(x)$ be the minimal polynomial of the matrix $A\in \mathbb{R}^{n\times n}$. Then $A$ is invertible if and only if the constant term of $\mu(x)$ is nonzero, that is, $\mu(0)\neq 0$.
Proof: The proofs of these two propositions are trivial.
Assume $\operatorname{deg}\mu(x)=p$. For the linear system $Ax=b$, suppose we first have an estimate $x_0$ of the solution, let $r_0=b-Ax_0$, and write $x=x_0+x_m$. Then
$$
Ax=b\iff A(x_0+x_m)=b\iff Ax_m=r_0
$$
Now let us observe what properties $x_m$ has.
Let $\mu(x) = x^p+c_{p-1}x^{p-1}+\cdots+c_1x+c_0$. By Proposition 2 and the invertibility of $A$, we know that $c_0\neq 0$. By the definition of the minimal polynomial, we know that
$$
\mu(A) = A^p+c_{p-1}A^{p-1}+\cdots+c_1A+c_0I=O.
$$
Therefore,
$$
\begin{aligned}
0&=A^px_m+c_{p-1}A^{p-1}x_m+\cdots+c_1Ax_m+c_0x_m\\\
&=A^{p-1}r_0+c_{p-1}A^{p-2}r_0+\cdots+c_1r_0+c_0x_m
\end{aligned}
$$
Hence,
$$
x_m = -\frac{1}{c_0}A^{p-1}r_0-\frac{c_{p-1}}{c_{0}}A^{p-2}r_0+\cdots-\frac{c_1}{c_0}r_0\quad(1)
$$
Thus, we see that $x_m\in \operatorname{span}\{r_0,Ar_0,\cdots,A^{p-1}r_0\}$. This leads us to the definition of the Krylov subspace:
Definition 2 (Krylov Subspace) For a positive integer $m$ and $A\in \mathbb{R}^{n\times n},\ r_0\in \mathbb{R}^n$, define
$$
K_m(A,r_0) =\operatorname{span}\{r_0,Ar_0,\cdots,A^{m-1}r_0\},
$$
which is called the Krylov subspace.
From the definition of the Krylov subspace, it is not difficult to see the following fact:
$$
K_1(A,r_0)\subseteq K_2(A,r_0)\subseteq\cdots\subseteq K_m(A,r_0)\subseteq K_{m+1}(A,r_0)\subseteq\cdots
$$
So, as $m$ increases, will the dimension of the Krylov subspace keep increasing? The answer is no. Using the definition of the minimal polynomial, it is not hard to obtain the result:
Proposition 3: If $\operatorname{deg}\mu(x)=p$, then
$$
K_p(A,r_0) =K_{p+1}(A,r_0)=K_{p+2}(A,r_0)=\cdots
$$
Proof: Since the minimal polynomial is the monic polynomial satisfied by $A$, $A^p$ can be expressed linearly in terms of $A^{p-1},\cdots,A,I$. Therefore, $A^pr_0$ can be expressed linearly in terms of $A^{p-1}r_0,\cdots,Ar_0,r_0$, and hence $K_p(A,r_0) =K_{p+1}(A,r_0)$. Similarly, we have $K_p(A,r_0)=K_{p+1}(A,r_0)=K_{p+2}(A,r_0)=\cdots$.
Definition 3: For given $A\in \mathbb{R}^{n\times n},\ r_0\in \mathbb{R}^n$, we call
$$
m = \min\{s\in\mathbb{N}\ |\ K_s(A,r_0)=K_{s+1}(A,r_0)\}
$$
the degree of $r_0$ with respect to $A$.
Remark 1: According to the definition of $m$, it is not difficult for the reader to prove that $m=\dim K_m(A,r_0)$.
Since we want to transform the $n$-dimensional linear system problem into a relatively small-scale problem through the Krylov subspace, we hope that the above positive integer $m$ is as small as possible. Then is the degree $p$ of the minimal polynomial exactly $m$? The answer is clearly no, because when $r_0=0$, $p$ is obviously not the smallest positive integer. However, we have the following result:
Proposition 4: Let $m$ be the degree of $r_0$ with respect to $A$. Then $m$ is the degree corresponding to the nonzero polynomial $g(x)$ of smallest degree such that $g(A)v=0$, namely,
$$
m = \min\{\operatorname{deg}g(x)|\ g(x)\neq 0\ \text{and}\ g(A)v=0\}
$$
Proof: Let
$$
\tau=\min\{\operatorname{deg}g(x)|\ g(x)\neq 0\ \text{and}\ g(A)v=0\}.
$$
On the one hand, since $m$ is the degree of $r_0$ with respect to $A$, we have
$$
K_m(A,r_0) =K_{m+1}(A,r_0)\iff \operatorname{span}\{r_0,Ar_0,\cdots,A^{m-1}r_0\}=\operatorname{span}\{r_0,Ar_0,\cdots,A^{m-1}r_0,A^mr_0\}.
$$
Therefore, $A^mr_0$ can be expressed linearly in terms of $r_0,Ar_0,\cdots,A^{m-1}r_0$, that is, there exist $c_0,\cdots,c_{m-1}$ such that
$$
A^mr_0 = c_0r_0+\cdots+c_{m-1}A^{m-1}r_0.
$$
Let $g(x) = x^m-c_{m-1}x^{m-1}-\cdots-c_0$. Then $g(A)r_0=0$, and by the definition of $\tau$, we know that $\tau\le m$.
On the other hand, by the definition of $\tau$, there exists a monic polynomial $h$ such that $\operatorname{deg}h(x)=\tau$ and $h(A)r_0=0$. Let
$$
h(x) = x^\tau+d_{\tau-1}x^{\tau-1}+\cdots+d_0,
$$
then
$$
A^\tau r_0 = -d_{\tau-1}A^{\tau-1}r_0-\cdots-d_0r_0\in K_\tau(A,r_0).
$$
Hence $K_\tau(A,r_0) =K_{\tau+1}(A,r_0)$, and by the definition of $m$, we have $m\le \tau$. Combining the above, we obtain $m=\tau$.
GMRES Algorithm
From the above analysis and equation $(1)$, we find that
$$
x_m\in K_p(A,r_0) = K_m(A,r_0)
$$
where $p$ is the degree of the minimal polynomial of $A$, and $m$ is the degree of $r_0$ with respect to $A$. Therefore, we can transform the problem of solving the linear system into the following optimization problem:
$$
\min_{x_m\in K_m(A,r_0)} \lVert A(x_0+x_m)-b\rVert_2
$$
This still looks like a large-scale $n$-dimensional problem. How should we reduce the dimension?
In fact, since $\dim K_m(A,r_0)=m$, we can apply Gram-Schmidt orthogonalization to the basis $\{r_0,Ar_0,\cdots,A^{m-1}r_0\}$ of $K_m(A,r_0)$, that is,
$$
\begin{aligned}
w_1=r_0,\quad v_1 = w_1/\lVert w_1\rVert_2;\\\
w_2 = Ar_0-(Ar_0,v_1)v_1,\quad v_2=w_2/\lVert w_2\rVert_2;\\\
\cdots\\\
w_m = A^{m-1}r_0-\sum_{k=1}^{m-1}(A^{m-1},v_k)v_k,\quad v_m = w_m/\lVert w_m\rVert_2.
\end{aligned}
$$
However, in practice, we do not know the value of $m$, and thus the Arnoldi algorithm arises.
Algorithm 1 (Arnoldi Algorithm)
Step 1: Choose a vector $v_1$ such that $\lVert v_1\rVert_2=1$; pre-set a sufficiently large maximum number of iterations
max_iteration.Step 2: For $j=1,2,\cdots,$ max_iteration, do:
Compute $h_{ij} = (Av_j,v_i)$, $i=1,2,\cdots,j$;
Compute $w_j = Av_j-\sum\limits_{i=1}^j h_{ij}v_i$;
Compute $h_{j+1,j} = \lVert w_j\rVert_2$;
If $h_{j+1,j}=0$, then terminate the algorithm.
End Do
From the iterative process of the Arnoldi algorithm, it is not difficult to see that the algorithm terminates exactly at step $m$, and
$$
A(v_1,v_2,\cdots,v_m) = (v_1,v_2,\cdots,v_{m+1})\begin{pmatrix}
h_{11}&h_{12}&h_{13}&\cdots&h_{1,m-1}&h_{1m}\\\
h_{21}&h_{22}&h_{23}&\cdots&h_{2,m-1}&h_{2m}\\\
&h_{32}&h_{33}&\cdots&h_{3,m-1}&h_{3m}\\\
&&h_{43}&\cdots&h_{4,m-1}&h_{4m}\\\
&&&\ddots&\vdots&\vdots\\\
&&&&h_{m,m-1}&h_{mm}\\\
&&&&&h_{m+1,m}
\end{pmatrix}
$$
Let $V_m = (v_1,v_2,\cdots,v_m)$ be the orthonormal basis of $K_m(A,r_0)$, and let $H_m$ be the $(m+1)\times m$ matrix on the right-hand side. Then
$$
AV_m = V_{m+1}H_m\quad (2)
$$
For $r_0 = b-Ax_0$, let $\beta = \lVert r_0\rVert_2$, take $v_1 = r_0/\beta$, and let $e_1=(1,0,\cdots,0)^T$. Substitute $v_1$ into the Arnoldi algorithm to obtain $V_m$ iteratively. Then, since $x_m\in K_m(A,r_0)$, there exists $y\in \mathbb{R}^m$ such that $x_m=V_my$. In addition, $r_0=\beta v_1 = \beta V_m e_1$. Hence
$$
\begin{aligned}
\lVert Ax-b\rVert_2&=\lVert A(x_0+x_m)-b\rVert_2\\\
&=\lVert A(x_0+V_my)-b\rVert_2\\\
&=\lVert AV_my-r_0\rVert_2\\\
&=\lVert V_{m+1}H_my-r_0\rVert_2\\\
&=\lVert V_{m+1}(H_my-\beta e_1)\rVert_2\\\
&=\lVert H_my-\beta e_1\rVert_2
\end{aligned}
$$
In summary, we have transformed the problem of solving the $n$-dimensional linear system $Ax=b$ into solving the least-squares problem for the $(m+1)\times m$ matrix
$$
\min_{y\in\mathbb{R}^m}\lVert H_my-\beta e_1\rVert_2
$$
thus achieving dimensionality reduction.
Summarizing the above process, we obtain the GMRES algorithm.
Algorithm 2 (GMRES)
Step 1: Compute $r_0=b-Ax_0,\beta = \lVert r_0\rVert_2,v_1=r_0/\beta$, and pre-set a sufficiently large maximum number of iterations
max_iteration;Step 2: For $j=1,2,\cdots,$ max_iteration, do:
- Compute $w_j=Av_j$;
- For $i=1,2,\cdots,j$, compute $h_{ij}=(w_j,v_i)$ and $w_j= w_j-\sum\limits_{i=1}^j h_{ij}v_i$;
- Compute $h_{j+1,j} = \lVert w_j\rVert_2$;
- If $h_{j+1,j}=0$, then let $m=j$, and go to Step 3;
End Do
Step 3: $H_m = \{h_{ij}\}_{1\le i\le m,1\le j\le m}$;
Step 4: Solve the least-squares problem
$$
y_m =\operatorname{arg}\min\limits_{y\in\mathbb{R}^m}\lVert H_my-\beta e_1\rVert_2
$$
and compute
$$
x = x_0+H_my_m.
$$
The cover image of this article was taken in Adelboden, Switzerland.
Krylov Subspace and GMRES Algorithm
