Search code examples
pythonnlpnltksimilarity

Override a function in nltk - Error in ContextIndex class


I am using text.similar('example') function from nltk.Text module.

(Which prints the similar words for a given word based on corpus.)

However I want to store that list of words in a list. But the function itself returns None.

#text is a variable of nltk.Text module
simList = text.similar("physics")
>>> a = text.similar("physics")
the and a in science this which it that energy his of but chemistry is
space mathematics theory as mechanics
>>> a
>>> a
# a contains no value.

So should I modify the source function itself? But I don't think it is a good practice. So how can I override that function so that it returns the value?

Edit - Referring this thread, I tried using the ContextIndex class. But I am getting the following error.

  File "test.py", line 39, in <module>
    text = nltk.text.ContextIndex(word.lower() for word in words)   File "/home/kenden/den/codes/nlpenv/local/lib/python2.7/site-packages/nltk/text.py", line 56, in __init__
    for i, w in enumerate(tokens))   File "/home/kenden/den/codes/nlpenv/local/lib/python2.7/site-packages/nltk/probability.py", line 1752, in __init__
    for (cond, sample) in cond_samples:   File "/home/kenden/den/codes/nlpenv/local/lib/python2.7/site-packages/nltk/text.py", line 56, in <genexpr>
    for i, w in enumerate(tokens))   File "/home/kenden/den/codes/nlpenv/local/lib/python2.7/site-packages/nltk/text.py", line 43, in _default_context
    right = (tokens[i+1].lower() if i != len(tokens) - 1 else '*END*') TypeError: object of type 'generator' has no len()

This is my line 39 of test.py

text = nltk.text.ContextIndex(word.lower() for word in words)

How can I solve this?


Solution

  • You are getting the error because the ContextIndex constructor is trying to take the len() of your token list (the argument tokens). But you actually pass it as a generator, hence the error. To avoid the problem, just pass a true list, e.g.:

    text = nltk.text.ContextIndex(list(word.lower() for word in words))