A Semigroup Approach to Finite Markov Chains

Posted by Tom Leinster

$MathML-enabled post (click for more details).$

A finite Markov chain is a very simple kind of transition system:

It can be encoded as a square matrix of nonnegative reals, each of whose rows sums to 1 (a ‘stochastic matrix’). Here you’ll meet the fundamental results on finite Markov chains, proved in a way that appears to be new. — TL

I’m going to prove all the standard finite Markov chain convergence results using the theory of compact semigroups. This approach may be known, but I couldn’t find it. I don’t obtain any of the bounds on convergence, although I suppose one can obtain them via this method with a little work.

$MathML-enabled post (click for more details).$

Compact semigroups

By a compact semigroup we mean a non-empty semigroup $S$ with a compact Hausdorff topology and jointly continuous multiplication $S\times S\to S$ . We shall assume for simplicity that the topology on $S$ is metric since our main examples have this property. The following simple exercise from topology will be used without comment in the sequel.

Lemma 1 Let $(X,d)$ be a compact metric space and $\{x_n\}$ a sequence in $X$ . Then $x_n$ converges to $x\in X$ if and only if every convergent subsequence of $\{x_n\}$ converges to $x$ .

For example, it is easy to prove from this lemma that if $S$ is a compact semigroup which is algebraically a group, then the inverse is continuous. Indeed, if $s_n\to s$ and $\tilde{s}$ is a limit of some subsequence of $\{s_n^{-1}\}$ , then $\tilde{s}s=1=s\tilde{s}$ and so $\tilde{s}=s^{-1}$ . By the lemma, it follows that $s_n^{-1}\to s^{-1}$ .

A compact semigroup is called monothetic if it is topologically generated by a single element, that is, $S=\overline{\langle s\rangle}$ for some $s\in S$ where, as usual, $\langle s\rangle =\{s^n\mid n\geq 1\}$ is the subsemigroup generated by $s$ . One easily checks that $S$ is commutative. In fact, the structure of $S$ is well known and we record it in the following theorem, due independently to Koch and Numakura without the metric assumption.

Theorem 2 (Koch, Numakura) Let $S=\overline{\langle s\rangle}$ be a monothetic compact (metric) semigroup. Then $S$ has a unique minimal ideal $I$ , namely the set of all limits of convergent subsequences of $\{s^n\}$ . Moreover, $I$ is a compact abelian group. The identity $s^{\omega}$ of $I$ is the unique idempotent of $S$ . Furthermore, $I=s^{\omega}S$ and the map $x\mapsto s^{\omega}x$ is a surjective continuous homomorphism $S\to I$ and so in particular $I$ is a monothetic compact abelian group.

Proof: The sets $s^n S$ with $n\geq 1$ form a decreasing chain of closed ideals and hence have non-empty intersection by compactness. Put $I=\bigcap_{n\geq 1} s^n S$ ; clearly $I$ is a closed ideal. We claim it consists precisely of all limits of convergent subsequences of $\{s^n\}$ .

Indeed, suppose that $x=\lim s^{n_k}$ and $n\geq 1$ . Then by passing to a subsequence we may assume that $n_k > n$ for all $k$ and $\{s^{n_k-n}\}$ converges to an element $y\in S$ . Then $x=s^n y$ by continuity of multiplication. We conclude that $x\in I$ . For the converse, first observe that every element of $S\setminus \{s^k\mid k\geq 1\}$ is a limit of a subsequence of $\{s^n\}$ . Thus it suffices to prove that if $s^m\in I$ for some $m\geq 1$ , then $s^m$ is a limit of a convergent subsequence of $\{s^n\}$ . But then we have $s^m=s^m y$ for some $y\in S$ . If $y=\lim s^{n_k}$ with $n_k$ an increasing sequence, then $s^m=\lim s^{m+n_k}$ . Otherwise, $s^m=s^{m+k}$ for some $k\geq 1$ . But then $s^{m+n k}=s$ for all $n\geq 1$ and so $s^{m+n k}\to s^m$ . Thus $I$ consists of all limits of convergent subsequences of $\{s^n\}$ .

We next show that $I$ is algebraically a group. It will then follow that it is a compact abelian group. To do this, it suffice to show (by commutativity) that $a x=b$ has a solution $x\in I$ for all $a,b\in I$ . Suppose $a=\lim s^{n_k}$ and $b=\lim s^{m_k}$ . Passing to a subsequence, we may assume that $m_k > n_k$ for all $k$ . Extracting further subsequences, we may assume $s^{m_k-n_k}$ converges to an element $x\in S$ . Then $a x=\lim s^{n_k} s^{m_k-n_k} = \lim s^{m_k} = b.$ This completes the proof that $I$ is a group.

Let us show that $I$ is the unique minimal ideal. Let $J$ be an ideal of $S$ . Then $\emptyset\neq I J\subseteq I\cap J\subseteq J$ . But $I$ is a group and so has no proper ideals. Thus $I J=I$ and so $I\subseteq J$ .

Suppose that $f\in S$ is an idempotent. From $f=\lim f^n$ , it is immediate that $f$ is a limit of a subsequence of $\{s^n\}$ and hence belongs to $I$ . But $I$ is a group and therefore contains a unique idempotent, its identity. Clearly $s^{\omega} S \subseteq I$ and so $s^{\omega}S=I$ by minimality. For $x,y\in S$ , one has $s^{\omega} x s^{\omega} y = (s^{\omega})^2 x y = s^{\omega} x y$ and so $x\mapsto s^{\omega}x$ is a continuous surjective homomorphism $S\to s^{\omega}S=I$ . This completes the proof.

Let $S=\overline{\langle s\rangle}$ be a compact monothetic semigroup (which we continue to assume to be metric) and let $S\times X\to X$ be a faithful continuous left action of $S$ on a compact Hausdorff space $X$ . Then $s^{\omega}X$ is a closed $S$ -invariant subspace, known as the eventual range of $s$ , and $S$ acts on $s^{\omega}X$ by homeomorphisms. Moreover, the action of $S$ on $s^{\omega}X$ factors through the homomorphism $S\to s^{\omega}S$ from Theorem 2.

Notice that the fixed-point set of $s$ is contained in the eventual range. Indeed, $s x=x$ implies $s^n x=x$ for all $n\geq 1$ and so $s^{\omega}x=x$ . It follows that if the eventual range of $s$ is a singleton, that is, if $s^{\omega}$ acts as a constant map, then $s$ has a unique fixed-point. Indeed, if $s^{\omega}X=\{x_0\}$ , then since the eventual range of $s$ is $S$ -invariant, clearly $s x_0=x_0$ . But the above discussion shows that each fixed point of $s$ belongs to the eventual range and so $x_0$ is the unique fixed-point of $S$ . In this case, $s^{\omega}S=\{s^{\omega}\}$ (by faithfulness of the action) and hence $s^n\to s^{\omega}$ in light of Theorem 2 and Lemma 1. We summarize this discussion in the following proposition.

Proposition 3 Let $S=\overline {\langle s\rangle}$ be a monothetic compact semigroup acting faithfully on the left on a compact Hausdorff space $X$ and suppose that $s^{\omega}$ acts as a constant map with image $x_0$ , i.e., $\{x_0\}$ is the eventual range of $s$ . Then $x_0$ is the unique fixed-point of $s$ on $X$ , $s^n\to s^{\omega}$ (uniformly) and $s^n x\to x_0$ for all $x\in X$ . The dual statement holds for right actions.

As a warmup exercise let us give a quick proof of the Banach fixed point theorem for compact metric spaces. Let $(X,d)$ be a compact metric space. A mapping $f\colon X\to X$ is a weak contraction if $d(f(x),f(y))\leq d(x,y)$ for all $x,y\in X$ . If the inequality is strict for $x\neq y$ , then $f$ is said to be a (strict) contraction. The weak contractions form a submonoid $WCtr(X)$ of the monoid of continuous self-maps of $X$ , which is easily verified to be closed in the compact-open topology (and hence is metric via the sup norm). Since the weak contractions clearly form an equicontinuous family, the monoid $WCtr(X)$ is in fact compact by the Arzelà–Ascoli theorem. The weak contractions form an ideal $J$ of $WCtr(X)$ .

Let $f\colon X\to X$ be a weak contraction such that $f^N$ is a contraction for some $N\geq 1$ . Then $S=\overline{\langle f\rangle}$ is a compact monothetic subsemigroup of $WCtr(X)$ . Moreover, if $I$ is the minimal ideal of $S$ , then $I\subseteq f^N S\subseteq J$ and hence consists of contractions. In particular, $f^{\omega}$ is a contraction. But the image of an idempotent is its fixed-point set and it is easily checked that a contraction has at most one fixed-point. Thus $f^{\omega}$ is a constant map to some point $x_0$ . By Proposition 3, $x_0$ is the unique fixed-point of $f$ and $f^n(x)\to x_0$ for all $x\in X$ . This is the Banach fixed-point theorem for compact metric spaces.

Stochastic matrices

Let $\mathcal{P}_n$ be the space of all probability distributions on $\{1,\ldots,n\}$ . We may identify it with the subspace of (row) vectors in $\mathbb{R}^n$ with non-negative entries summing to $1$ . That is, $\mathcal{P}_n$ is the standard $(n-1)$ -simplex in $\mathbb{R}^n$ and hence is compact. Let $\mathcal{S}_n$ be the set of all $n\times n$ stochastic matrices, that is, the set of all square matrices whose rows belong to $\mathcal{P}_n$ . Then $\mathcal{S}_n$ is a compact monoid (homeomorphic to $\mathcal{P}_n^n$ ) with respect to matrix multiplication. Moreover, it is a partially ordered monoid with respect to the usual ordering ( $A\geq B$ if $A-B\geq 0$ ) on non-negative matrices.

The natural right action $\mathbb{R}^n\times \mathcal{S}_n\to \mathbb{R}^n$ is clearly continuous and $\mathcal{P}_n$ is a compact invariant subspace. Moreover, the action of $\mathcal{S}_n$ on $\mathcal{P}_n$ is faithful because $\mathcal{P}_n$ contains the standard unit vectors. Thus we can also view $\mathcal{S}_n$ as a compact monoid of transformations of both $\mathcal{P}_n$ and $\mathbb{R}^n$ with respect to the compact-open topology.

Let $J$ denote the matrix of all ones; note that $P J=J$ for any stochastic matrix $P$ .

Lemma 4 Let $\varepsilon \gt 0$ . Then $I_{\varepsilon} = \{Q\in \mathcal{S}_n \mid Q \geq \varepsilon J \}$ is a closed left ideal of $\mathcal{S}_n$ .

Proof: Clearly $I_{\varepsilon}$ is closed. If $P,Q\in \mathcal{S}_n$ with $Q\in I_{\varepsilon}$ , then $P Q\geq P (\varepsilon J)=\varepsilon J$ and so $P Q\in I_{\varepsilon}$ , as required.

Consequently, the positive stochastic matrices form a left ideal of $\mathcal{S}_n$ (topologically open). A probability distribution $\pi\in \mathcal{P}_n$ is stationary for a stochastic matrix $P$ if $\pi P=\pi$ , i.e., it is a fixed-point of $P$ .

Lemma 5 An idempotent positive stochastic matrix $Q$ has a unique stationary distribution $\pi$ . Moreover, $\pi$ is strictly positive, each row of $Q$ is $\pi$ , and all fixed vectors of $Q$ are multiples of $\pi$ . Consequently, $Q$ has rank $1$ and acts as a constant map on $\mathcal{P}_n$ with image $\pi$ .

Proof: Let $Q_{i j}$ be the largest entry in column $j$ . Then $\sum_{k=1}^n Q_{i k}(Q_{i j}-Q_{k j}) = \sum_{k=1}^n Q_{i k} Q_{i j}-\sum_{k=1}^n Q_{i k}Q_{k j} = Q_{i j}-Q_{i j}=0$ and hence $Q_{i j}=Q_{k j}$ , for $k=1,\ldots,n$ , as each $Q_{i k} \gt 0$ and $Q_{i j}-Q_{k j}\geq 0$ . Thus all rows of $Q$ are $\pi$ for some strictly positive probability distribution $\pi$ . Idempotency immediately yields $\pi Q=\pi$ . Moreover, since $Q$ is a rank $1$ idempotent, every fixed vector of $Q$ must be a multiple of $\pi$ . Consequently, $\pi$ is the unique stationary distribution for $Q$ . Since the image of an idempotent mapping is its fixed-point set, we conclude that $Q$ acts on $\mathcal{P}_n$ as a constant map with image $\pi$ .

A stochastic matrix $P$ is said to be ergodic if $P^m \gt 0$ for some $m$ . Now we can prove one of the main theorems of finite Markov chain theory. Recall that an eigenvalue of a matrix is simple if it has multiplicity one as a root of the characteristic polynomial.

Theorem 6 Let $P$ be an ergodic stochastic matrix. Then:

$P$ has a unique stationary distribution $\pi$ ;

$\pi$ is strictly positive;

$P^n\to \Pi$ where $\Pi$ is the matrix all of whose rows are $\pi$ ;

if $\mu$ is any probability distribution, then $\mu P^n\to \pi$ ;

the eigenvalue $1$ of $P$ is simple and all other eigenvalues of $P$ have modulus strictly less than $1$ .

Proof: Let $P\in \mathcal{S}_n$ with $P^m \gt 0$ . Put $S=\overline{\langle P\rangle}$ and let $I$ be the minimal ideal of $S$ . As $I\subseteq S P^m$ , it follows from Lemma 4 that $I$ consists of positive elements. Thus $P^{\omega} \gt 0$ and so Lemma 5 implies that $P^{\omega}$ is a constant map on $\mathcal{P}_n$ with image $\pi \gt 0$ its stationary distribution. Proposition 3 then yields that $\pi$ is the unique stationary distribution for $P$ , that $P^n\to P^{\omega}$ and that $\mu P^n\to \pi$ for all $\mu\in \mathcal{P}_n$ . Moreover, $P^{\omega}=\Pi$ by Lemma 5.

For the final statement, observe that $\mathbb{R}^n = \mathbb{R}^n P^{\omega} \oplus \ker P^{\omega} = \mathbb{R}\pi \oplus \ker P^{\omega}$ (the last equality by Lemma 5). Moreover, $P$ preserves this direct sum decomposition because $P$ and $P^{\omega}$ commute. Set $V=\ker P^{\omega}$ . From the fact that $P^n\to P^{\omega}$ , it follows that the powers of the restriction $P|_V$ of $P$ to $V$ converge to $0$ and hence $P|_V$ has spectral radius strictly less than $1$ . Since the characteristic polynomial of $P$ is the product of $x - 1$ and the characteristic polynomial of $P|_V$ , it follows that $1$ is a simple eigenvalue of $P$ .

A stochastic matrix $P$ is called irreducible if, for all $i,j$ , there exists $k \gt 0$ such that $P^k_{i j} \gt 0$ . It is easy to verify that if $P$ is irreducible, then $Q=(1/2) I_n + (1/2) P$ is ergodic, where $I_n$ is the $n\times n$ identity matrix. Observe that $v Q=v$ if and only if $v=v P$ for any vector $v$ . This leads to the existence and uniqueness of stationary distributions for irreducible Markov chains.

Corollary 7 Let $P$ be an irreducible stochastic matrix. Then $P$ has a unique stationary distribution $\pi$ . Moreover, $\pi$ is strictly positive and every fixed vector of $P$ is a multiple of $\pi$ .

Posted at January 3, 2012 10:31 PM UTC

TrackBack URL for this Entry: https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/2484

Re: A Semigroup Approach to Finite Markov Chains

This is very nice! Thanks!

I’ll make just a trivial terminological comment for the moment. When you started using the phrase “eventual range” in earlier threads, I didn’t realize that it was actually an accepted term. I thought you were just taking what I was calling the eventual image and adapting it to your preference for the word “range” over “image”. But I see now that “eventual range” is a term in use, e.g. it’s defined on page 242 of Lind and Marcus’s book on symbolic dynamics.

(As far as I was concerned, “eventual image” was my own coinage. I googled it a while ago, and discovered that a few other people had used it too, but I didn’t think of trying it with “range” in place of “image”.)

Posted by: Tom Leinster on January 3, 2012 11:07 PM | Permalink | Reply to this

Can the methods used here be related to the metric coinduction of Kozen and Ruozzi?

Posted by: David Corfield on January 4, 2012 9:32 AM | Permalink | Reply to this

Good question. I have no idea.

Posted by: Benjamin Steinberg on January 5, 2012 12:52 AM | Permalink | Reply to this

Looks cool, and though it is a bit beyond me, does it relate to earlier work on markov chains and semigroups?

http://u.math.biu.ac.il/~margolis/Representation%20Theory%20Seminar/Semigroups%20and%20Ring%20Theoretic%20Methods%20in%20Probability.pdf

Posted by: Trurl on January 4, 2012 6:41 PM | Permalink | Reply to this

Yes and no. If

S

is any compact semigroup then the probability measures on S form a compact semigroup under convolution. The stochastic matrices essentially form a special case. Some of the convergence results of Diaconis and Brown can be understood from this viewpoint but you need semigroup representation theory for the bounds on rate of convergence.

Posted by: Benjamin Steinberg on January 5, 2012 12:58 AM | Permalink | Reply to this

The n-Category Café

Skip to the Main Content

January 3, 2012