Topics
Markov chains and transition diagrams with state vectors and transition matrices, including finding future probabilities using sn=Tns0 and identifying steady state from Ts=s.
Want a deeper conceptual understanding? Try our interactive lesson!
A Markov Chain is a probability model for a scenario that transitions between different states at fixed time intervals.
An example is a fluorescent molecule filmed frame by frame under a microscope. It has three states:
On: glowing
Off: temporarily dark
Bleached: permanently dead, will never glow again
At each frame, the molecule can transition between states, and those transitions have fixed probabilities:
If the molecule is on, there's an 80% chance it stays on, a 15% chance it turns off, and a 5% chance it bleaches.
If the molecule is off, there's a 60% chance it stays off, and a 40% chance it turns back on.
If the molecule is bleached, it's off forever, so there's a 100% probability it stays in the bleached state.
We represent this system with a transition diagram:
The transition matrix is a square matrix with order n×n, where n is the number of states. In the ith row and jth column of the matrix we out the probability of going from state j to state i. Let's revisit our example from earlier:
Let's call the states 1 for on, 2 for off, and 3 for bleached. The transition matrix is 3×3, so something like:
Now let's fill in the first column by looking at what can happen from the "On" state, since we said "On" was column 1.
There's a 0.8 chance of staying in "On", so in row 1 we put 0.8.
There's a probability of 0.15 that the molecule moves to "Off", which is row 2, so we put 0.15 there.
Finally, we put 0.05 in row 3, because the probability of going from 1→3 (which is "On" to "Bleached" is 0.05)
If we repeat that process for the other states, we find the other columns:
Once we have the transition matrix, we can use it to see how the state evolves at each step. Imagine a molecule starts in the "On" state, which we said was the first row / column. We can represent this state with the vector
which just says "there's a 100% chance the molecule is in the 'On' state right now". We call this vector the initial state vector, denoted s0.
One frame later, the state evolves according to the transition matrix:
We call this vector s1, the state vector after 1 time step.
In general, the state vector after n steps is sn, and it's entries give the probability of being in each state at that time.
Notice that for Markov transition matrices, the columns represent the different starting vertices, and the rows the ending vertices.
This is the opposite of the convention we used for transition matrices of graphs in general. In fact, the IB uses an un common convention for Markov chains. Most textbooks and papers write
Instead of the IB's Ts0. This makes sense because we start with s0, and then apply T. In that case, each row is for a starting vertex, and each column for an end vertex. But the IB prefers Ts0, so we must too. On exams, Markov transitions matrices given to you will have columns that sum to 1, but if you use the so called row-stochastic convention you will not be penalized (historically at least).
Just as the first time step involves multiplying the state vector by the transition matrix, every subsequent step we just multiply by the transition matrix again.
Starting from state s0, we take n steps to get to sn, which means we multiply by T a total of n times:
In our molecule example, if we want to find s100 we calculate using technology:
This tells us that if a molecule starts in the "On" state, 100 times steps later there's a 97.4% chance it's in the "bleached" state. This makes sense because once in the bleached state, a molecule will stay in the bleached state. So in the long term, the likelihood of being in the bleached state should approach 100%.
Some transition matrices have eigenvectors. As a reminder, this means that
But the probabilities in s must add to 1, and the same must be true for Ts. That forces λ=1, so any eigenvector of a transition matrix has eigenvalue 1:
That means that when the transition matrix is applied to this state, the state does not change. We therefore call this state the steady state.
Finding eigenvectors is usually hard, but because we know the eigenvalue is 1, it's easier for transition matrices. For example, consider the transition matrix
With eigenvalue 1, the eigenvector (xy) we are looking for satisfies
Both rows give the same result: 0.8y=0.4x⇒x=2y, but we also know that x+y=1:
With a calculator, the fastest way to find the steady state is just to compute a large power of the transition matrix:
The columns will be the same if a steady state exists, so we just read one column and find the steady state is (2/31/3).
For any starting state (p1−p) (has to sum to 1), the state after 100 steps is
Nice work completing Markov Chains, here's a quick recap of what we covered:
Exercises checked off
Markov chains and transition diagrams with state vectors and transition matrices, including finding future probabilities using sn=Tns0 and identifying steady state from Ts=s.
Want a deeper conceptual understanding? Try our interactive lesson!
A Markov Chain is a probability model for a scenario that transitions between different states at fixed time intervals.
An example is a fluorescent molecule filmed frame by frame under a microscope. It has three states:
On: glowing
Off: temporarily dark
Bleached: permanently dead, will never glow again
At each frame, the molecule can transition between states, and those transitions have fixed probabilities:
If the molecule is on, there's an 80% chance it stays on, a 15% chance it turns off, and a 5% chance it bleaches.
If the molecule is off, there's a 60% chance it stays off, and a 40% chance it turns back on.
If the molecule is bleached, it's off forever, so there's a 100% probability it stays in the bleached state.
We represent this system with a transition diagram:
The transition matrix is a square matrix with order n×n, where n is the number of states. In the ith row and jth column of the matrix we out the probability of going from state j to state i. Let's revisit our example from earlier:
Let's call the states 1 for on, 2 for off, and 3 for bleached. The transition matrix is 3×3, so something like:
Now let's fill in the first column by looking at what can happen from the "On" state, since we said "On" was column 1.
There's a 0.8 chance of staying in "On", so in row 1 we put 0.8.
There's a probability of 0.15 that the molecule moves to "Off", which is row 2, so we put 0.15 there.
Finally, we put 0.05 in row 3, because the probability of going from 1→3 (which is "On" to "Bleached" is 0.05)
If we repeat that process for the other states, we find the other columns:
Once we have the transition matrix, we can use it to see how the state evolves at each step. Imagine a molecule starts in the "On" state, which we said was the first row / column. We can represent this state with the vector
which just says "there's a 100% chance the molecule is in the 'On' state right now". We call this vector the initial state vector, denoted s0.
One frame later, the state evolves according to the transition matrix:
We call this vector s1, the state vector after 1 time step.
In general, the state vector after n steps is sn, and it's entries give the probability of being in each state at that time.
Notice that for Markov transition matrices, the columns represent the different starting vertices, and the rows the ending vertices.
This is the opposite of the convention we used for transition matrices of graphs in general. In fact, the IB uses an un common convention for Markov chains. Most textbooks and papers write
Instead of the IB's Ts0. This makes sense because we start with s0, and then apply T. In that case, each row is for a starting vertex, and each column for an end vertex. But the IB prefers Ts0, so we must too. On exams, Markov transitions matrices given to you will have columns that sum to 1, but if you use the so called row-stochastic convention you will not be penalized (historically at least).
Just as the first time step involves multiplying the state vector by the transition matrix, every subsequent step we just multiply by the transition matrix again.
Starting from state s0, we take n steps to get to sn, which means we multiply by T a total of n times:
In our molecule example, if we want to find s100 we calculate using technology:
This tells us that if a molecule starts in the "On" state, 100 times steps later there's a 97.4% chance it's in the "bleached" state. This makes sense because once in the bleached state, a molecule will stay in the bleached state. So in the long term, the likelihood of being in the bleached state should approach 100%.
Some transition matrices have eigenvectors. As a reminder, this means that
But the probabilities in s must add to 1, and the same must be true for Ts. That forces λ=1, so any eigenvector of a transition matrix has eigenvalue 1:
That means that when the transition matrix is applied to this state, the state does not change. We therefore call this state the steady state.
Finding eigenvectors is usually hard, but because we know the eigenvalue is 1, it's easier for transition matrices. For example, consider the transition matrix
With eigenvalue 1, the eigenvector (xy) we are looking for satisfies
Both rows give the same result: 0.8y=0.4x⇒x=2y, but we also know that x+y=1:
With a calculator, the fastest way to find the steady state is just to compute a large power of the transition matrix:
The columns will be the same if a steady state exists, so we just read one column and find the steady state is (2/31/3).
For any starting state (p1−p) (has to sum to 1), the state after 100 steps is
Nice work completing Markov Chains, here's a quick recap of what we covered:
Exercises checked off