Search code examples
python-3.xmarkov-chainsmarkov-models

Generating Markov transition matrix in Python


Imagine I have a series of 4 possible Markovian states (A, B, C, D):

X = [A, B, B, C, B, A, D, D, A, B, A, D, ....]

How can I generate a Markov transformation matrix using Python? The matrix must be 4 by 4, showing the probability of moving from each state to the other 3 states. I've been looking at many examples online but in all of them, the matrix is given, not calculated based on data. I also looked into hmmlearn but nowhere I read on how to have it spit out the transition matrix. Is there a library that I can use for this purpose?

Here is an R code for the exact thing I am trying to do in Python: https://stats.stackexchange.com/questions/26722/calculate-transition-matrix-markov-in-r


Solution

  • This might give you some ideas:

    transitions = ['A', 'B', 'B', 'C', 'B', 'A', 'D', 'D', 'A', 'B', 'A', 'D']
    
    def rank(c):
        return ord(c) - ord('A')
    
    T = [rank(c) for c in transitions]
    
    #create matrix of zeros
    
    M = [[0]*4 for _ in range(4)]
    
    for (i,j) in zip(T,T[1:]):
        M[i][j] += 1
    
    #now convert to probabilities:
    for row in M:
        n = sum(row)
        if n > 0:
            row[:] = [f/sum(row) for f in row]
    
    #print M:
    
    for row in M:
        print(row)
    

    output:

    [0.0, 0.5, 0.0, 0.5]
    [0.5, 0.25, 0.25, 0.0]
    [0.0, 1.0, 0.0, 0.0]
    [0.5, 0.0, 0.0, 0.5]
    

    On Edit Here is a function which implements the above ideas:

    #the following code takes a list such as
    #[1,1,2,6,8,5,5,7,8,8,1,1,4,5,5,0,0,0,1,1,4,4,5,1,3,3,4,5,4,1,1]
    #with states labeled as successive integers starting with 0
    #and returns a transition matrix, M,
    #where M[i][j] is the probability of transitioning from i to j
    
    def transition_matrix(transitions):
        n = 1+ max(transitions) #number of states
    
        M = [[0]*n for _ in range(n)]
    
        for (i,j) in zip(transitions,transitions[1:]):
            M[i][j] += 1
    
        #now convert to probabilities:
        for row in M:
            s = sum(row)
            if s > 0:
                row[:] = [f/s for f in row]
        return M
    
    #test:
    
    t = [1,1,2,6,8,5,5,7,8,8,1,1,4,5,5,0,0,0,1,1,4,4,5,1,3,3,4,5,4,1,1]
    m = transition_matrix(t)
    for row in m: print(' '.join('{0:.2f}'.format(x) for x in row))
    

    Output:

    0.67 0.33 0.00 0.00 0.00 0.00 0.00 0.00 0.00
    0.00 0.50 0.12 0.12 0.25 0.00 0.00 0.00 0.00
    0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00
    0.00 0.00 0.00 0.50 0.50 0.00 0.00 0.00 0.00
    0.00 0.20 0.00 0.00 0.20 0.60 0.00 0.00 0.00
    0.17 0.17 0.00 0.00 0.17 0.33 0.00 0.17 0.00
    0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00
    0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00
    0.00 0.33 0.00 0.00 0.00 0.33 0.00 0.00 0.33