Search code examples
nlpnltkpos-taggerindex-errorsenna

list index out of range error when tag_sents() method of NLTK SennaTagger is called


IndexError: list index out of range when tag_sents() method of NLTK SennaTagger(http://www.nltk.org/_modules/nltk/tag/senna.html) is called.

A list of sentences is given as the input to tag_sentsmethod.

A senna executable file is needed to run the tagger. Installation guide to SENNA toolkit can be found here. http://ronan.collobert.com/senna/

Code:

from nltk.tag import SennaTagger

SENNA_EXECUTABLE_DIR = '../../tools/senna'

pos_tagger = SennaTagger(SENNA_EXECUTABLE_DIR)

tagged = pos_tagger.tag_sents(["All the banks are closed", "Today is Sunday"])

Output:

Traceback (most recent call last):

  File "<ipython-input-90-886051c3d91d>", line 1, in <module>
    tagged = pos_tagger.tag_sents(["All the banks are closed", "Today is Sunday"])

  File "F:\Programs\Anaconda3\lib\site-packages\nltk\tag\senna.py", line 55, in tag_sents
    tagged_sents = super(SennaTagger, self).tag_sents(sentences)

  File "F:\Programs\Anaconda3\lib\site-packages\nltk\classify\senna.py", line 161, in tag_sents
    result[tag] = tags[map_[tag]].strip()

IndexError: list index out of rangeenter code here

Solution

  • The input for senna.tag_sents is list of list of strings, which can be achieved through [word_tokenize(sent) for sent in sents]

    >>> from nltk import word_tokenize
    >>> from nltk.tag import SennaTagger
    >>> senna = SennaTagger('/home/alvas/senna/')
    >>> sents = ["All the banks are closed", "Today is Sunday"]
    
    >>> tokenized_sents = [word_tokenize(sent) for sent in sents]
    >>> senna.tag_sents(tokenized_sents)
    [[('All', u'PDT'), ('the', u'DT'), ('banks', u'NNS'), ('are', u'VBP'), ('closed', u'VBN')], [('Today', u'NN'), ('is', u'VBZ'), ('Sunday', u'NNP')]]
    

    Or use map if you don't want to materialize tokenized_sents before tagging:

    >>> tokenized_sents = map(word_tokenize, sents)
    >>> senna.tag_sents(tokenized_sents)
    [[('All', u'PDT'), ('the', u'DT'), ('banks', u'NNS'), ('are', u'VBP'), ('closed', u'VBN')], [('Today', u'NN'), ('is', u'VBZ'), ('Sunday', u'NNP')]]