I am trying to select the likely next word based on the current word, using previous word pair occurrences as "weights". I am having trouble implementing np.random.choice()
in the actual choice of the next word.
import pandas as pd
import numpy as np
texty = "won't you celebrate with me what i have shaped into a kind of life i had no model born in babylon both nonwhite and woman what did i see to be except myself i made it up here on this bridge between starshine and clay my one hand holding tight my other hand come celebrate with me that everyday
something has tried to kill me and has failed."
# https://www.poetryfoundation.org/poems/50974/wont-you-celebrate-with-me
words = texty.split()
# Creating the text-based transition matrix
x = pd.crosstab(pd.Series(words[1:],name='next'),
pd.Series(words[:-1],name='word'),normalize=1)
print(x)
# Selecting the next word based on the current word.
# https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.choice.html
current = "and"
# this part isn't working--->
next = np.random.choice(current,1,current) # was "y"
I don't know how to refer to the transition matrix from here. I would like this choice to be based on the probabilities of previous occurrences. For example, the probability of "clay" following "and" is 33%.
x is a Pandas DataFrame.
You can access any of the columns of that DataFrame as if the column names were keys into a dictionary.
> print(x['won\'t'])
next
a 0.0
and 0.0
babylon 0.0
...
with 0.0
woman 0.0
you 1.0
Name: won't, dtype: float64
The column returns as a Pandas Series. If you select a column from the DataFrame (your transition matrix x
), the index
of the Series you select will be available words from the text, and the values
will be their associated probabilities. You can provide each of these to np.random.choice
to get the next word, with probabilities weighted from your transition matrix.
> current_word = 'won\'t'
> current_column = x[current_word]
> next_word = np.random.choice(current_column.index,
p=current_column.values)
> print(next_word)
you