Search code examples
rstatisticsprobabilitymarkov-chains

Manual simulation of Markov Chain in R


Consider the Markov chain with state space S = {1, 2}, transition matrix

enter image description here

and initial distribution α = (1/2, 1/2).

  1. Simulate 5 steps of the Markov chain (that is, simulate X0, X1, . . . , X5). Repeat the simulation 100 times. Use the results of your simulations to solve the following problems.

    • Estimate P(X1 = 1|X0 = 1). Compare your result with the exact probability.

My solution:

# returns Xn 
func2 <- function(alpha1, mat1, n1) 
{
  xn <- alpha1 %*% matrixpower(mat1, n1+1)

  return (xn)
}

alpha <- c(0.5, 0.5)
mat <- matrix(c(0.5, 0.5, 0, 1), nrow=2, ncol=2)
n <- 10


for (variable in 1:100) 
{
   print(func2(alpha, mat, n))
}

What is the difference if I run this code once or 100 times (as is said in the problem-statement)?

How can I find the conditional probability from here on?


Solution

  • Let

    alpha <- c(1, 1) / 2
    mat <- matrix(c(1 / 2, 0, 1 / 2, 1), nrow = 2, ncol = 2) # Different than yours
    

    be the initial distribution and the transition matrix. Your func2 only finds n-th step distribution, which isn't needed, and doesn't simulate anything. Instead we may use

    chainSim <- function(alpha, mat, n) {
      out <- numeric(n)
      out[1] <- sample(1:2, 1, prob = alpha)
      for(i in 2:n)
        out[i] <- sample(1:2, 1, prob = mat[out[i - 1], ])
      out
    }
    

    where out[1] is generated using only the initial distribution and then for subsequent terms we use the transition matrix.

    Then we have

    set.seed(1)
    # Doing once
    chainSim(alpha, mat, 1 + 5)
    # [1] 2 2 2 2 2 2
    

    so that the chain initiated at 2 and got stuck there due to the specified transition probabilities.

    Doing it for 100 times we have

    # Doing 100 times
    sim <- replicate(chainSim(alpha, mat, 1 + 5), n = 100)
    rowMeans(sim - 1)
    # [1] 0.52 0.78 0.87 0.94 0.99 1.00
    

    where the last line shows how often we ended up in state 2 rather than 1. That gives one (out of many) reasons why 100 repetitions are more informative: we got stuck at state 2 doing just a single simulation, while repeating it for 100 times we explored more possible paths.

    Then the conditional probability can be found with

    mean(sim[2, sim[1, ] == 1] == 1)
    # [1] 0.4583333
    

    while the true probability is 0.5 (given by the upper left entry of the transition matrix).