I am trying to print the bigrams for a text in Python 3.5. The text is already pre-processed and split into individual words.
I tried two different ways (shown below), neither work.
The first:
ninety_seven=df.loc[97]
nine_bi=ngrams(ninety_seven,2)
print(nine_bi)
This outputs:
< generator object ngrams at 0x0B4F9E70>
The second is:
ninety_seven=df.loc[97]
bigrm = list(nltk.bigrams(ninety_seven))
print(*map(' '.join, bigrm), sep=', ')
This outputs:
TypeError: sequence item 0: expected str instance, list found
df.loc[97]
is [car, chip, indication, posted, flight, post, flight]
I want it to print as:
car chip, chip indication, indication posted, posted flight, flight post, post flight
Try this:
>>> ninety_seven=df.loc[97].loc['FSR Narrative']
>>> nine_bi=ngrams(ninety_seven,2)
>>> print(nine_bi)
<generator object ngrams at 0x7f879020f308>
>>> print([" ".join(t) for t in nine_bi])
['car chip', 'chip indication', 'indication posted', 'posted flight', 'flight post', 'post flight']
Here is a simple example:
>>> from nltk import ngrams
>>> test = ['car', 'chip', 'indication', 'posted', 'flight', 'post', 'flight']
>>> nine_bi=ngrams(test,2)
>>> print(nine_bi)
<generator object ngrams at 0x7f879020f308>
>>> print([" ".join(t) for t in nine_bi])
['car chip', 'chip indication', 'indication posted', 'posted flight', 'flight post', 'post flight']