Search code examples

I am having problems doing Word Sense Disambiguation in Python using Lesk algorithm

I am new to Python and NLTK so please bear with me. I wish to find the sense of a word in the context of a sentence. I am using the Lesk WSD algorithm but it is giving different outputs every time I run it. I know that Lesk has some level of inaccuracy. But, I think a POS tag will increase accuracy.

The Lesk algorithm takes a POS tag as an argument, but it takes 'n','s','v' as an input and not 'NN','VBP' or other POS tags which are outputted by the pos_tag() function. I would like to know how to tag words in the form of 'n','s','v', or if there is a method in which I can convert the 'NN','VBP' and other tags into 'n','s','v', so I can give them as an input to the lesk(context_sentence,word,pos_tag) function.

I am calculating the sentiment score of every word using SentiWordNet afterwards.

    from nltk.wsd import lesk
    from nltk import word_tokenize
    import nltk, re, pprint
    from nltk.corpus import sentiwordnet as swn

    def word_sense():

        sent = word_tokenize("He should be happy.")
        word = "be"
        pos = "v"
        score = lesk(sent,word,pos)
        print (str(score),type(score))
        set1 = re.findall("'([^']*)'",str(score))[0]
        print (set1)
        bank = swn.senti_synset(str(set1))
        print (bank)



  • nltk.wsd.lesk does not return score, it returns the predicted Synset:

    >>> from nltk.corpus import wordnet as wn
    >>> from nltk.corpus import sentiwordnet as swn
    >>> from nltk import word_tokenize
    >>> from nltk.wsd import lesk
    >>> sent = word_tokenize("He should be happy".lower())
    >>> lesk(sent, 'be', 'v')

    lesk is not perfect, it should only be used as a baseline system for WSD.

    Although this is nice:

    >>> ss = str(lesk(sent, 'be', 'v'))
    >>> re.findall("'([^']*)'",ss)

    There's a simpler to get the synset identifier:

    >>> lesk(sent, 'be', 'v').name()

    Then you can do:

    >>> swn.senti_synset(lesk(sent, 'be', 'v').name())

    To convert POS tag to WN POS, you can simply try: Converting POS tags from TextBlob into Wordnet compatible inputs