Markov Chains
Table of Contents
1. Stochastic Processes
A random process (or stochastic process) is a family \(\{X_t, t\in T\}\) of random variables indexed by some set \(T\). As time passes, \(X_t\) “evolves” in a random but prescribed way. Stochastic processes can be described by:
- state space \(S\), the possible values of \(X_t\)
- time index set \(T\)
- dependence relation among \(X_t\) over time (i.e. some probabilisitc law that describes how \(X_t\) moves to \(X_{t+1}\))
2. Markov Chains
We will consider stochastic processes with a finite state space and discrete time, called Markov chains.
Markov chains satisfy the Markov property: for all \(n \in T\) and \(a_0, \dots, a_{n-1}\) for \(i,j \in S\), we have:
\begin{align} \mathbb{P}[X_{n+1}=j|X_0=a_0, X_1=a_1, \dots, X_{n-1}=a_{n-1},X_n=i] = \mathbb{P}[X_{n+1}=j|X_n=i] \end{align}Intuitively, this means that the future depends on the current state, but not on how you got there.
2.1. Transition Probability
The probability \(\mathbb{P}[X_{n+1}=j|X_n=i]\) of going from one state to the next is called the transition probability. If the transition probability does not depend on the time step \(n\), then the Markov chain is called homogeneous.
We can define a transition probability matrix \(P\) such that
\begin{align} P_{ij} = \mathbb{P}[X_{n+1}=j|X_n=i] \end{align}We can specify the initial distribution of \(X_0\) as \(\mathbb{P}[X_0=i]=\pi_0(i)\). The distribution of \(X_0\) and the transition probability matrix are all that is needed to fully specify a Markov chain.
2.2. Distribution
First, note that the joint probability of \(X_0,\dots,X_n\) is given by
\begin{align} \mathbb{P}[X_0=i_0, X_1=i_1, \dots, X_n=i_n] = \mathbb{P}[X_0=i_0]\mathbb{P}[X_1=i_1|X_0=i_0]\mathbb{P}[X_2=i_2|X_0=i_0,X_1=i_1]\cdots \notag \end{align}By the Markov property, this simplifies to:
\begin{align} \mathbb{P}[X_0=i_0, X_1=i_1, \dots, X_n=i_n] &= \mathbb{P}[X_0=i_0]\mathbb{P}[X_1=i_1|X_0=i_0]\mathbb{P}[X_2=i_2|X_1=i_1]\cdots \notag \\ &= \pi_0(i)P_{i_0,i_1}P_{i_1,i_2}\cdots P_{i_{n-1},i_n} \notag \end{align}Consequently, the marginal distribution of \(X_n\) can be computed as
\begin{align} \mathbb{P}[X_n=i_n] &= \sum_{i_0,i_1,\dots,i_{n-1}\in S} \mathbb{P}[X_0=i_0,X_1=i_1,\dots X_{n-1}=i_{n-1},X_n=i_n] \notag \\ &= \sum_{i_0,i_1,\dots,i_{n-1}\in S} \pi_0(i)P_{i_0,i_1}\cdots P_{i_{n-1},i_n} \notag \\ &= [\pi_0(i)P^n]_n \notag \end{align}In fact, for all \(n \geq 0\), \(\boxed{pi_n=\pi_0P^n}\).
2.3. Hitting Probability
Let \(A\) be the event of hitting \(0\) before \(N\). Then, let \(\alpha(i) = \mathbb{P}[A|X_0=i]\), and we can see that \(\alpha(0)=1\) and \(\alpha(N)=0\).
By the Markov property and homogeneity, we have
\begin{align} \mathbb{P}[A|X_0=i, X_1=j] = \mathbb{P}[A|X_1=j]=\mathbb{P}[A|X_0=j] = \alpha(j) \notag \end{align}Then,
\begin{align} \alpha(i) = \sum_{j \in S}P_{ij}\alpha(j) = (1-p)\alpha(i-1) + p\alpha (i+1) \notag \end{align}Solving, we get
\begin{align} \alpha(i) = \begin{cases} \frac{N-i}{N} &p = \frac{1}{2} \\ \frac{r^i-r^N}{1-r^N} &p \neq \frac{1}{2} \end{cases} \end{align}where \(r = \frac{1-p}{p}\).
3. Distributions
3.1. Stationary Distribution
The row state vector \(\pi\) is called a stationary distribution or invariant distribution of the Markov chain with transition probability matrix \(P\) if
- \(\pi_i \geq 0 \; \forall i \in S\) and \(\sum_{i\in S}\pi_i = 1\)
- \(\pi P = \pi\) (left eigenvector with eigenvalue \(1\))
State \(j\) is accessible from state \(i\) if \([P^n]_{ij} > 0\) for some positive \(n \in \mathbb{N}\) (there is a path from \(i\) to \(j\)). States \(i\) and \(j\) intercommunicate if \(j\) is accessible from \(i\) and \(i\) is also accessible from \(j\).
A Markov chain is called irreducible if any state \(i\) and \(j\) in the state space are intercommunicable, i.e. any state can reach any other state. Every Markov chain with finite state space has at least one stationary distribution. Additionally, if that chain is irreducible, that stationary distribution is unique.
The period \(d(i)\) of a state \(i\in S\) is defined as
\begin{align} d(i) = \text{gcd}(n \geq 1 | [P^n]_{ii}>0) \end{align}In other words, the GCD of the number of steps it takes for all paths that go back to that state. A state is periodic if \(d(i) > 1\), and aperiodic if \(d(i) = 1\).
Theorem: If \(i\) and \(j\) intercommunicate, then \(d(i)=d(j)\).
Proof: Recall that \(d(i) = \text{gcd}(n\geq 1 | [P^n]_{ii}>0)\). We want to show that \(d(i)=d(j)\), so it is equivalent to show that \(d(i) | d(j)\) and \(d(j) | d(i)\).
Since \(i\) and \(j\) intercommunicate, there is a path from \(i\) to \(j\) with length \(l_{ij}\), and a path from \(j\) to \(i\) with length \(l_{ji}\). We know that there exists a path from \(j\) back to \(j\) with length \(l_{ij}+l_{ji}\). Then, an arbitrary path from \(j\) back to \(j\) with length \(k\) means that we also have a path from \(i\) back to \(i\) with length \(l_{ij}+l_{ji}+k\). Then,
\begin{align} d(i)&|l_{ij}+l_{ji}+k \notag \\ d(i) &| l_{ij} + l_{ji} \notag \\ \Rightarrow d(i) &| k \notag \end{align}Since \(k\) must be a multiple of \(d(j)\), we have \(d(i)|d(j)\). By symmetry, we can also see that \(d(j)|d(i)\). \(\square\)
3.2. Limiting Distribution
The Fundamental Theorem of Markov Chains says that if a finite Markov chain is irreducible and aperiodic, then it converges to a stationary distribution.
4. Random Walk on Undirected Graph
Let \(G=(V,E)\) be a finite, connected, undirected graph. We can model a random walk on this graph with a Markov chain: given that the current state is \(u\), we move to a neighbor with probability \(\frac{1}{\text{deg}(u)}\).
A random walk on a finite, connected, undirected graph has the following unique stationary distribution:
\begin{align} \pi_v = \frac{\text{deg}(v)}{2|E|} \quad \forall v \in V \end{align}