Skip to the Main Content

Note:These pages make extensive use of the latest XHTML and CSS Standards. They ought to look great in any standards-compliant modern browser. Unfortunately, they will probably look horrible in older browsers, like Netscape 4.x and IE 4.x. Moreover, many posts use MathML, which is, currently only supported in Mozilla. My best suggestion (and you will thank me when surfing an ever-increasing number of sites on the web which have been crafted to use the new standards) is to upgrade to the latest version of your browser. If that's not possible, consider moving to the Standards-compliant and open-source Mozilla browser.

January 3, 2012

A Semigroup Approach to Finite Markov Chains

Posted by Tom Leinster

Guest post by Benjamin Steinberg

A finite Markov chain is a very simple kind of transition system:

It can be encoded as a square matrix of nonnegative reals, each of whose rows sums to 1 (a ‘stochastic matrix’). Here you’ll meet the fundamental results on finite Markov chains, proved in a way that appears to be new. — TL

I’m going to prove all the standard finite Markov chain convergence results using the theory of compact semigroups. This approach may be known, but I couldn’t find it. I don’t obtain any of the bounds on convergence, although I suppose one can obtain them via this method with a little work.

Compact semigroups

By a compact semigroup we mean a non-empty semigroup SS with a compact Hausdorff topology and jointly continuous multiplication S×SSS\times S\to S. We shall assume for simplicity that the topology on SS is metric since our main examples have this property. The following simple exercise from topology will be used without comment in the sequel.

Lemma 1  Let (X,d)(X,d) be a compact metric space and {x n}\{x_n\} a sequence in XX. Then x nx_n converges to xXx\in X if and only if every convergent subsequence of {x n}\{x_n\} converges to xx.

For example, it is easy to prove from this lemma that if SS is a compact semigroup which is algebraically a group, then the inverse is continuous. Indeed, if s nss_n\to s and s˜\tilde{s} is a limit of some subsequence of {s n 1}\{s_n^{-1}\}, then s˜s=1=ss˜\tilde{s}s=1=s\tilde{s} and so s˜=s 1\tilde{s}=s^{-1}. By the lemma, it follows that s n 1s 1s_n^{-1}\to s^{-1}.

A compact semigroup is called monothetic if it is topologically generated by a single element, that is, S=s¯S=\overline{\langle s\rangle} for some sSs\in S where, as usual, s={s nn1}\langle s\rangle =\{s^n\mid n\geq 1\} is the subsemigroup generated by ss. One easily checks that SS is commutative. In fact, the structure of SS is well known and we record it in the following theorem, due independently to Koch and Numakura without the metric assumption.

Theorem 2 (Koch, Numakura)  Let S=s¯S=\overline{\langle s\rangle} be a monothetic compact (metric) semigroup. Then SS has a unique minimal ideal II, namely the set of all limits of convergent subsequences of {s n}\{s^n\}. Moreover, II is a compact abelian group. The identity s ωs^{\omega} of II is the unique idempotent of SS. Furthermore, I=s ωSI=s^{\omega}S and the map xs ωxx\mapsto s^{\omega}x is a surjective continuous homomorphism SIS\to I and so in particular II is a monothetic compact abelian group.

Proof:  The sets s nSs^n S with n1n\geq 1 form a decreasing chain of closed ideals and hence have non-empty intersection by compactness. Put I= n1s nSI=\bigcap_{n\geq 1} s^n S; clearly II is a closed ideal. We claim it consists precisely of all limits of convergent subsequences of {s n}\{s^n\}.

Indeed, suppose that x=lims n kx=\lim s^{n_k} and n1n\geq 1. Then by passing to a subsequence we may assume that n k>nn_k > n for all kk and {s n kn}\{s^{n_k-n}\} converges to an element ySy\in S. Then x=s nyx=s^n y by continuity of multiplication. We conclude that xIx\in I. For the converse, first observe that every element of S{s kk1}S\setminus \{s^k\mid k\geq 1\} is a limit of a subsequence of {s n}\{s^n\}. Thus it suffices to prove that if s mIs^m\in I for some m1m\geq 1, then s ms^m is a limit of a convergent subsequence of {s n}\{s^n\}. But then we have s m=s mys^m=s^m y for some ySy\in S. If y=lims n ky=\lim s^{n_k} with n kn_k an increasing sequence, then s m=lims m+n ks^m=\lim s^{m+n_k}. Otherwise, s m=s m+ks^m=s^{m+k} for some k1k\geq 1. But then s m+nk=ss^{m+n k}=s for all n1n\geq 1 and so s m+nks ms^{m+n k}\to s^m. Thus II consists of all limits of convergent subsequences of {s n}\{s^n\}.

We next show that II is algebraically a group. It will then follow that it is a compact abelian group. To do this, it suffice to show (by commutativity) that ax=ba x=b has a solution xIx\in I for all a,bIa,b\in I. Suppose a=lims n ka=\lim s^{n_k} and b=lims m kb=\lim s^{m_k}. Passing to a subsequence, we may assume that m k>n km_k > n_k for all kk. Extracting further subsequences, we may assume s m kn ks^{m_k-n_k} converges to an element xSx\in S. Then ax=lims n ks m kn k=lims m k=b. a x=\lim s^{n_k} s^{m_k-n_k} = \lim s^{m_k} = b. This completes the proof that II is a group.

Let us show that II is the unique minimal ideal. Let JJ be an ideal of SS. Then IJIJJ\emptyset\neq I J\subseteq I\cap J\subseteq J. But II is a group and so has no proper ideals. Thus IJ=II J=I and so IJI\subseteq J.

Suppose that fSf\in S is an idempotent. From f=limf nf=\lim f^n, it is immediate that ff is a limit of a subsequence of {s n}\{s^n\} and hence belongs to II. But II is a group and therefore contains a unique idempotent, its identity. Clearly s ωSIs^{\omega} S \subseteq I and so s ωS=Is^{\omega}S=I by minimality. For x,ySx,y\in S, one has s ωxs ωy=(s ω) 2xy=s ωxys^{\omega} x s^{\omega} y = (s^{\omega})^2 x y = s^{\omega} x y and so xs ωxx\mapsto s^{\omega}x is a continuous surjective homomorphism Ss ωS=IS\to s^{\omega}S=I. This completes the proof.

Let S=s¯S=\overline{\langle s\rangle} be a compact monothetic semigroup (which we continue to assume to be metric) and let S×XXS\times X\to X be a faithful continuous left action of SS on a compact Hausdorff space XX. Then s ωXs^{\omega}X is a closed SS-invariant subspace, known as the eventual range of ss, and SS acts on s ωXs^{\omega}X by homeomorphisms. Moreover, the action of SS on s ωXs^{\omega}X factors through the homomorphism Ss ωSS\to s^{\omega}S from Theorem 2.

Notice that the fixed-point set of ss is contained in the eventual range. Indeed, sx=xs x=x implies s nx=xs^n x=x for all n1n\geq 1 and so s ωx=xs^{\omega}x=x. It follows that if the eventual range of ss is a singleton, that is, if s ωs^{\omega} acts as a constant map, then ss has a unique fixed-point. Indeed, if s ωX={x 0}s^{\omega}X=\{x_0\}, then since the eventual range of ss is SS-invariant, clearly sx 0=x 0s x_0=x_0. But the above discussion shows that each fixed point of ss belongs to the eventual range and so x 0x_0 is the unique fixed-point of SS. In this case, s ωS={s ω}s^{\omega}S=\{s^{\omega}\} (by faithfulness of the action) and hence s ns ωs^n\to s^{\omega} in light of Theorem 2 and Lemma 1. We summarize this discussion in the following proposition.

Proposition 3  Let S=s¯S=\overline {\langle s\rangle} be a monothetic compact semigroup acting faithfully on the left on a compact Hausdorff space XX and suppose that s ωs^{\omega} acts as a constant map with image x 0x_0, i.e., {x 0}\{x_0\} is the eventual range of ss. Then x 0x_0 is the unique fixed-point of ss on XX, s ns ωs^n\to s^{\omega} (uniformly) and s nxx 0s^n x\to x_0 for all xXx\in X. The dual statement holds for right actions.

As a warmup exercise let us give a quick proof of the Banach fixed point theorem for compact metric spaces. Let (X,d)(X,d) be a compact metric space. A mapping f:XXf\colon X\to X is a weak contraction if d(f(x),f(y))d(x,y)d(f(x),f(y))\leq d(x,y) for all x,yXx,y\in X. If the inequality is strict for xyx\neq y, then ff is said to be a (strict) contraction. The weak contractions form a submonoid WCtr(X)WCtr(X) of the monoid of continuous self-maps of XX, which is easily verified to be closed in the compact-open topology (and hence is metric via the sup norm). Since the weak contractions clearly form an equicontinuous family, the monoid WCtr(X)WCtr(X) is in fact compact by the Arzelà–Ascoli theorem. The weak contractions form an ideal JJ of WCtr(X)WCtr(X).

Let f:XXf\colon X\to X be a weak contraction such that f Nf^N is a contraction for some N1N\geq 1. Then S=f¯S=\overline{\langle f\rangle} is a compact monothetic subsemigroup of WCtr(X)WCtr(X). Moreover, if II is the minimal ideal of SS, then If NSJI\subseteq f^N S\subseteq J and hence consists of contractions. In particular, f ωf^{\omega} is a contraction. But the image of an idempotent is its fixed-point set and it is easily checked that a contraction has at most one fixed-point. Thus f ωf^{\omega} is a constant map to some point x 0x_0. By Proposition 3, x 0x_0 is the unique fixed-point of ff and f n(x)x 0f^n(x)\to x_0 for all xXx\in X. This is the Banach fixed-point theorem for compact metric spaces.

Stochastic matrices

Let 𝒫 n\mathcal{P}_n be the space of all probability distributions on {1,,n}\{1,\ldots,n\}. We may identify it with the subspace of (row) vectors in n\mathbb{R}^n with non-negative entries summing to 11. That is, 𝒫 n\mathcal{P}_n is the standard (n1)(n-1)-simplex in n\mathbb{R}^n and hence is compact. Let 𝒮 n\mathcal{S}_n be the set of all n×nn\times n stochastic matrices, that is, the set of all square matrices whose rows belong to 𝒫 n\mathcal{P}_n. Then 𝒮 n\mathcal{S}_n is a compact monoid (homeomorphic to 𝒫 n n\mathcal{P}_n^n) with respect to matrix multiplication. Moreover, it is a partially ordered monoid with respect to the usual ordering (ABA\geq B if AB0A-B\geq 0) on non-negative matrices.

The natural right action n×𝒮 n n\mathbb{R}^n\times \mathcal{S}_n\to \mathbb{R}^n is clearly continuous and 𝒫 n\mathcal{P}_n is a compact invariant subspace. Moreover, the action of 𝒮 n\mathcal{S}_n on 𝒫 n\mathcal{P}_n is faithful because 𝒫 n\mathcal{P}_n contains the standard unit vectors. Thus we can also view 𝒮 n\mathcal{S}_n as a compact monoid of transformations of both 𝒫 n\mathcal{P}_n and n\mathbb{R}^n with respect to the compact-open topology.

Let JJ denote the matrix of all ones; note that PJ=JP J=J for any stochastic matrix PP.

Lemma 4  Let ε>0\varepsilon \gt 0. Then I ε={Q𝒮 nQεJ}I_{\varepsilon} = \{Q\in \mathcal{S}_n \mid Q \geq \varepsilon J \} is a closed left ideal of 𝒮 n\mathcal{S}_n.

Proof:  Clearly I εI_{\varepsilon} is closed. If P,Q𝒮 nP,Q\in \mathcal{S}_n with QI εQ\in I_{\varepsilon}, then PQP(εJ)=εJP Q\geq P (\varepsilon J)=\varepsilon J and so PQI εP Q\in I_{\varepsilon}, as required.

Consequently, the positive stochastic matrices form a left ideal of 𝒮 n\mathcal{S}_n (topologically open). A probability distribution π𝒫 n\pi\in \mathcal{P}_n is stationary for a stochastic matrix PP if πP=π\pi P=\pi, i.e., it is a fixed-point of PP.

Lemma 5  An idempotent positive stochastic matrix QQ has a unique stationary distribution π\pi. Moreover, π\pi is strictly positive, each row of QQ is π\pi, and all fixed vectors of QQ are multiples of π\pi. Consequently, QQ has rank 11 and acts as a constant map on 𝒫 n\mathcal{P}_n with image π\pi.

Proof:  Let Q ijQ_{i j} be the largest entry in column jj. Then k=1 nQ ik(Q ijQ kj)= k=1 nQ ikQ ij k=1 nQ ikQ kj=Q ijQ ij=0 \sum_{k=1}^n Q_{i k}(Q_{i j}-Q_{k j}) = \sum_{k=1}^n Q_{i k} Q_{i j}-\sum_{k=1}^n Q_{i k}Q_{k j} = Q_{i j}-Q_{i j}=0 and hence Q ij=Q kjQ_{i j}=Q_{k j}, for k=1,,nk=1,\ldots,n, as each Q ik>0Q_{i k} \gt 0 and Q ijQ kj0Q_{i j}-Q_{k j}\geq 0. Thus all rows of QQ are π\pi for some strictly positive probability distribution π\pi. Idempotency immediately yields πQ=π\pi Q=\pi. Moreover, since QQ is a rank 11 idempotent, every fixed vector of QQ must be a multiple of π\pi. Consequently, π\pi is the unique stationary distribution for QQ. Since the image of an idempotent mapping is its fixed-point set, we conclude that QQ acts on 𝒫 n\mathcal{P}_n as a constant map with image π\pi.

A stochastic matrix PP is said to be ergodic if P m>0P^m \gt 0 for some mm. Now we can prove one of the main theorems of finite Markov chain theory. Recall that an eigenvalue of a matrix is simple if it has multiplicity one as a root of the characteristic polynomial.

Theorem 6  Let PP be an ergodic stochastic matrix. Then:

  1. PP has a unique stationary distribution π\pi;
  2. π\pi is strictly positive;
  3. P nΠP^n\to \Pi where Π\Pi is the matrix all of whose rows are π\pi;
  4. if μ\mu is any probability distribution, then μP nπ\mu P^n\to \pi;
  5. the eigenvalue 11 of PP is simple and all other eigenvalues of PP have modulus strictly less than 11.

Proof:  Let P𝒮 nP\in \mathcal{S}_n with P m>0P^m \gt 0. Put S=P¯S=\overline{\langle P\rangle} and let II be the minimal ideal of SS. As ISP mI\subseteq S P^m, it follows from Lemma 4 that II consists of positive elements. Thus P ω>0P^{\omega} \gt 0 and so Lemma 5 implies that P ωP^{\omega} is a constant map on 𝒫 n\mathcal{P}_n with image π>0\pi \gt 0 its stationary distribution. Proposition 3 then yields that π\pi is the unique stationary distribution for PP, that P nP ωP^n\to P^{\omega} and that μP nπ\mu P^n\to \pi for all μ𝒫 n\mu\in \mathcal{P}_n. Moreover, P ω=ΠP^{\omega}=\Pi by Lemma 5.

For the final statement, observe that n= nP ωkerP ω=πkerP ω \mathbb{R}^n = \mathbb{R}^n P^{\omega} \oplus \ker P^{\omega} = \mathbb{R}\pi \oplus \ker P^{\omega} (the last equality by Lemma 5). Moreover, PP preserves this direct sum decomposition because PP and P ωP^{\omega} commute. Set V=kerP ωV=\ker P^{\omega}. From the fact that P nP ωP^n\to P^{\omega}, it follows that the powers of the restriction P| VP|_V of PP to VV converge to 00 and hence P| VP|_V has spectral radius strictly less than 11. Since the characteristic polynomial of PP is the product of x1x - 1 and the characteristic polynomial of P| VP|_V, it follows that 11 is a simple eigenvalue of PP.

A stochastic matrix PP is called irreducible if, for all i,ji,j, there exists k>0k \gt 0 such that P ij k>0P^k_{i j} \gt 0. It is easy to verify that if PP is irreducible, then Q=(1/2)I n+(1/2)PQ=(1/2) I_n + (1/2) P is ergodic, where I nI_n is the n×nn\times n identity matrix. Observe that vQ=vv Q=v if and only if v=vPv=v P for any vector vv. This leads to the existence and uniqueness of stationary distributions for irreducible Markov chains.

Corollary 7  Let PP be an irreducible stochastic matrix. Then PP has a unique stationary distribution π\pi. Moreover, π\pi is strictly positive and every fixed vector of PP is a multiple of π\pi.

Posted at January 3, 2012 10:31 PM UTC

TrackBack URL for this Entry:

5 Comments & 0 Trackbacks

Re: A Semigroup Approach to Finite Markov Chains

This is very nice! Thanks!

I’ll make just a trivial terminological comment for the moment. When you started using the phrase “eventual range” in earlier threads, I didn’t realize that it was actually an accepted term. I thought you were just taking what I was calling the eventual image and adapting it to your preference for the word “range” over “image”. But I see now that “eventual range” is a term in use, e.g. it’s defined on page 242 of Lind and Marcus’s book on symbolic dynamics.

(As far as I was concerned, “eventual image” was my own coinage. I googled it a while ago, and discovered that a few other people had used it too, but I didn’t think of trying it with “range” in place of “image”.)

Posted by: Tom Leinster on January 3, 2012 11:07 PM | Permalink | Reply to this

Re: A Semigroup Approach to Finite Markov Chains

Can the methods used here be related to the metric coinduction of Kozen and Ruozzi?

Posted by: David Corfield on January 4, 2012 9:32 AM | Permalink | Reply to this

Re: A Semigroup Approach to Finite Markov Chains

Good question. I have no idea.
Posted by: Benjamin Steinberg on January 5, 2012 12:52 AM | Permalink | Reply to this

Re: A Semigroup Approach to Finite Markov Chains

Looks cool, and though it is a bit beyond me, does it relate to earlier work on markov chains and semigroups?

Posted by: Trurl on January 4, 2012 6:41 PM | Permalink | Reply to this

Re: A Semigroup Approach to Finite Markov Chains

Yes and no. If SS is any compact semigroup then the probability measures on S form a compact semigroup under convolution. The stochastic matrices essentially form a special case. Some of the convergence results of Diaconis and Brown can be understood from this viewpoint but you need semigroup representation theory for the bounds on rate of convergence.
Posted by: Benjamin Steinberg on January 5, 2012 12:58 AM | Permalink | Reply to this

Post a New Comment