Search code examples
rmarkov-chains

How to generate a sample using n-order Markov Chains with R?


I'm attempting to generate a sample from an n-order transition matrix using Markov Chains in R. I've successfully constructed this n-order transition matrix using the following code:

set.seed(1)

dat <- sample(c("A", "B", "C"), size = 2000, replace = TRUE) # Data

n <- 2 # Order of the transition matrix
if (n > 1) {
  from <- head(apply(embed(dat, n)[, n:1], 1, paste, collapse = ""), -1)
  to <- dat[-1:-n]
} else {
  from <- dat[-length(dat)]
  to <- dat[-1]
}

fromTo <- data.frame(cbind(from, to))
TM <- table(fromTo)
TM <- TM / rowSums(TM) # Transition matrix

However, I'm facing difficulties in writing a code that generates a sample using the generated transition matrix which adapts to varying values of n. Is there a way to do it?

Ideally, I'd prefer a solution that doesn't involve the 'markovchain' package due to compatibility issues across different R versions.


Solution

  • Update

    If you are just wondering how to generate a sample from the given transition matrix, you can try the code below for example (on top of the MarkovChain function built in the previous answer)

    MarkovChainSampling <- function(dat, ord, preStat){
      TM <- MarkovChain(dat, ord)
      sample(colnames(TM), 1, prob = TM[preStat, ])
    }
    

    such that

    > MarkovChainSampling(dat, 2, "A")
    [1] "C"
    
    > MarkovChainSampling(dat, 3, "AB")
    [1] "A"
    
    > MarkovChainSampling(dat, 4, "AAA")
    [1] "C"
    

    Previous

    I think you are after the transition matrix of Markov Chain of order n. Below is one option where you might find some clues.

    You can use embed like below

    MarkovChain <- function(dat, ord) {
      d <- as.data.frame(embed(dat, ord))
      df <- with(
        d,
        data.frame(
          pre = do.call(paste, c(d[-ord], sep = "")),
          cur = d[[ord]]
        )
      )
      proportions(table(df), 1)
    }
    

    and you will obtain

    > MarkovChain(dat, 2)
       cur
    pre         A         B         C
      A 0.3377386 0.3509545 0.3113069
      B 0.3333333 0.3348281 0.3318386
      C 0.3513097 0.3174114 0.3312789
    
    > MarkovChain(dat, 3)
        cur
    pre          A         B         C
      AA 0.3347826 0.3826087 0.2826087
      AB 0.3430962 0.3263598 0.3305439
      AC 0.3396226 0.3160377 0.3443396
      BA 0.3273543 0.2959641 0.3766816
      BB 0.3392857 0.3482143 0.3125000
      BC 0.3783784 0.3063063 0.3153153
      CA 0.3524229 0.3700441 0.2775330
      CB 0.3155340 0.3300971 0.3543689
      CC 0.3348837 0.3302326 0.3348837