Search code examples
pythonmatrixprobabilitymarkov-chains

How to generate a random sequence given a probability matrix of transitions?


The script below produces a probability matrix for a given list:

transitions = ['A', 'B', 'B', 'C', 'B', 'A', 'D', 'D', 'A', 'B', 'A', 'D']

def rank(c):
   return ord(c) - ord('A')

T = [rank(c) for c in transitions]

#create matrix of zeros

M = [[0]*4 for _ in range(4)]

for (i,j) in zip(T,T[1:]):
   M[i][j] += 1

#now convert to probabilities:
for row in M:
   n = sum(row)
   if n > 0:
       row[:] = [f/sum(row) for f in row]

#print M:
for row in M:
   print(row)

output

[0.0, 0.5, 0.0, 0.5]
[0.5, 0.25, 0.25, 0.0]
[0.0, 1.0, 0.0, 0.0]
[0.5, 0.0, 0.0, 0.5]

I now want to do the opposite, and make a new transition list of A B C D following the probability matrix.
How can I make this happen?


Solution

  • The choices function of the random library could be of help. As the question doesn't indicate how to choose the first letter, here it is chosen with the same probability as the contents of the original list.

    Since Python 3.6 random.choices accepts a parameter with weights. It is not strictly necessary to normalize them.

    import random
    
    letter = random.choice(transitions)  # take a starting letter with the same weights as the original list
    new_list = [letter]
    for _ in range(len(transitions) - 1):
        letter = chr(random.choices(range(4), weights=M[rank(letter)])[0] + ord('A'))
        new_list.append(letter)
    print(new_list)
    

    The full code could be generalized somewhat to work with any kind of nodes, not just consecutive letters:

    from _collections import defaultdict
    import random
    
    transitions = ['A', 'B', 'B', 'C', 'B', 'A', 'D', 'D', 'A', 'B', 'A', 'D']
    
    nodes = sorted(set(transitions))  # a list of all letters used
    M = defaultdict(int)  # dictionary counting the occurrences for each transition i,j)
    
    for (i, j) in zip(transitions, transitions[1:]):
        M[(i, j)] += 1
    
    # dictionary with for each node a list of frequencies for the transition to a next node
    T = {i: [M[(i, j)] for j in nodes] for i in nodes}
    
    # node = random.choice(transitions) # chose the first node randomly with the same probability as the original list
    node = random.choice(nodes) # chose the first node randomly, each node with equal probability
    new_list = [node]
    for _ in range(9):
        node = random.choices(nodes, T[node])[0]
        new_list.append(node)
    
    print(new_list)
    

    Example output:['D', 'A', 'D', 'A', 'D', 'D', 'A', 'D', 'A', 'B']