The script below produces a probability matrix for a given list:
transitions = ['A', 'B', 'B', 'C', 'B', 'A', 'D', 'D', 'A', 'B', 'A', 'D']
def rank(c):
return ord(c) - ord('A')
T = [rank(c) for c in transitions]
#create matrix of zeros
M = [[0]*4 for _ in range(4)]
for (i,j) in zip(T,T[1:]):
M[i][j] += 1
#now convert to probabilities:
for row in M:
n = sum(row)
if n > 0:
row[:] = [f/sum(row) for f in row]
#print M:
for row in M:
print(row)
output
[0.0, 0.5, 0.0, 0.5]
[0.5, 0.25, 0.25, 0.0]
[0.0, 1.0, 0.0, 0.0]
[0.5, 0.0, 0.0, 0.5]
I now want to do the opposite, and make a new transition list of A B C D following the probability matrix.
How can I make this happen?
The choices
function of the random library could be of help. As the question doesn't indicate how to choose the first letter, here it is chosen with the same probability as the contents of the original list.
Since Python 3.6 random.choices
accepts a parameter with weights. It is not strictly necessary to normalize them.
import random
letter = random.choice(transitions) # take a starting letter with the same weights as the original list
new_list = [letter]
for _ in range(len(transitions) - 1):
letter = chr(random.choices(range(4), weights=M[rank(letter)])[0] + ord('A'))
new_list.append(letter)
print(new_list)
The full code could be generalized somewhat to work with any kind of nodes, not just consecutive letters:
from _collections import defaultdict
import random
transitions = ['A', 'B', 'B', 'C', 'B', 'A', 'D', 'D', 'A', 'B', 'A', 'D']
nodes = sorted(set(transitions)) # a list of all letters used
M = defaultdict(int) # dictionary counting the occurrences for each transition i,j)
for (i, j) in zip(transitions, transitions[1:]):
M[(i, j)] += 1
# dictionary with for each node a list of frequencies for the transition to a next node
T = {i: [M[(i, j)] for j in nodes] for i in nodes}
# node = random.choice(transitions) # chose the first node randomly with the same probability as the original list
node = random.choice(nodes) # chose the first node randomly, each node with equal probability
new_list = [node]
for _ in range(9):
node = random.choices(nodes, T[node])[0]
new_list.append(node)
print(new_list)
Example output:['D', 'A', 'D', 'A', 'D', 'D', 'A', 'D', 'A', 'B']