Search code examples
pythonclassinheritancenltkwordnet

Extending a class in nltk. - python


The aim is to add additional functions to the wordnet class in nltk, e.g.:

from nltk.corpus import wordnet

class WN(wordnet):
    def foobar(self):
        print 'foobar'

x = WN
WN.foobar()

but it gives an error:

Traceback (most recent call last):
  File "/home/alvas/workspace/pybabel-fresh/babelnet/utils/pybabel_WordNet.py", line 5, in <module>
    class WN(wordnet):
  File "/usr/local/lib/python2.7/dist-packages/nltk/corpus/util.py", line 44, in __init__
    assert issubclass(reader_cls, CorpusReader)
TypeError: Error when calling the metaclass bases
    issubclass() arg 1 must be a class

So I tried with nltk.corpus.reader.WordNetCorpusReader (http://www.nltk.org/_modules/nltk/corpus/reader/wordnet.html#WordNetCorpusReader):

from nltk.corpus.reader import WordNetCorpusReader

class WN(WordNetCorpusReader):
    def __init__(self):
        self = WN.__init__()

    def foobar(self):
        return "foobar"

x = WN
x.foobar()

Still it seems like if I'm using WordNetCorpusReader, I need to instantiate it, so I got:

Traceback (most recent call last):
  File "/home/alvas/workspace/pybabel-fresh/babelnet/utils/pybabel_WordNet.py", line 13, in <module>
    x.foobar()
TypeError: unbound method foobar() must be called with WN instance as first argument (got nothing instead)

Then I tried:

from nltk.corpus.reader import WordNetCorpusReader

class WN(WordNetCorpusReader):
    def foobar(self):
        return "foobar"

x = WN
for i in x.all_synsets():
    print i

[out]:

Traceback (most recent call last):
  File "/home/alvas/workspace/pybabel-fresh/babelnet/utils/pybabel_WordNet.py", line 10, in <module>
    for i in x.all_synsets():
TypeError: unbound method all_synsets() must be called with WN instance as first argument (got nothing instead)

How do I extend the nltk wordnet API with new functions? Note: that the aim is to create a new class with the new functions.


Solution

  • Your second attempt seems closest. The problem there is with your constructor:

    class WN(WordNetCorpusReader):
        def __init__(self):
            self = WN.__init__()  # needs an instance as the first argument, recursive, and no need to assign to self
    

    The __init__ method needs an instance as its first argument (here self), and in addition you are calling the __init__ method of the wrong class. This will lead to a RuntimeError: maximum recursion depth exceeded error. Finally, you simply want to call the method; you don't need to assign the results of the method to self.

    I think you meant to do this instead:

    from nltk.corpus.reader import WordNetCorpusReader
    import nltk
    
    class WN(WordNetCorpusReader):
        def __init__(self, *args):
            WordNetCorpusReader.__init__(self, *args)
    
        def foobar(self):
            return "foobar"
    

    The catch is, though, that you will need to pass the required WordNetCorpusReader.__init__ args to your new class. In my version of nltk, that means you will need to pass a root argument as follows:

    >>> x = WN(nltk.data.find('corpora/wordnet'))
    >>> x.foobar()
    'foobar'
    >>> x.synsets('run')
    [Synset('run.n.01'), Synset('test.n.05'), ...]
    

    A more efficient approach

    A much more efficient way to do the same thing is as follows:

    class WN(WordNetCorpusReader):
        root = nltk.data.find('corpora/wordnet')  # make root a class variable, so you only need to load it once
        def __init__(self, *args, **kwargs):
            WordNetCorpusReader.__init__(self, WN.root, *args, **kwargs)  # add root yourself here, so no arguments are required
    
        def foobar(self):
            return "foobar"
    

    Now test it:

    >>> x = WN()
    >>> x.foobar()
    'foobar'
    >>> x.synsets('run')
    [Synset('run.n.01'), Synset('test.n.05'), ...]
    

    By the way, I've enjoyed seeing your work on the nltk tag.