Search code examples
python-3.xnltkn-gram

I can't bigram a sentece with Python3


I'm using python3 and i'm traing to bigram a sentence but the interpreter gives me a problem that i can't understand.

~$ python3
Python 3.6.9 (default, Nov  7 2019, 10:44:02) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> from nltk import word_tokenize
>>> from nltk.util import ngrams
>>> text = "Hi How are you? i am fine and you"
>>> token=nltk.word_tokenize(text)
>>> bigrams=ngrams(token,2)
>>> bigrams
<generator object ngrams at 0x7ff1d81d2468>
>>> print (bigrams)
<generator object ngrams at 0x7ff1d81d2468>

What does it means: "generator object ngrams at 0x7ff1d81d2468"? Why I can neither inspect nor print n-grams?


Solution

  • Generator objects are iterable, but only once - see this answer. When print tries to display them, it shows their type rather than their actual 'items'. You can convert the generator object into a list using

    >>> bigrams=list(ngrams(token,2))
    

    and them print their items using

    >>> print(bigrams)
    

    as they are now a list object, so their items are printed instead of 'description' of them.