Search code examples
nlpartificial-intelligenceword-sense-disambiguation

Word sense disambiguation for pair of words


Say that I have a word A and a word B, where I use B as a hint which implies the meaning of A. For instance, A = bass, B = music, given this word pair, as human beings we can immediately know what does the word A mean.

I know that there are lots of algorithms that work for sentences. I'm wondering if there has been algorithms developed for doing WSD only for a pair of words.


Solution

  • Word Sense Disambiguation (WSD) is the task in disambiguating a word given a context sentence/document. In the case, of a two token phrase, the context is basically the other token.

    You can try out different WSD software and here's a list: Anyone know of some good Word Sense Disambiguation software?

    I'll give you an example using pywsd (https://github.com/alvations/pywsd):

    $ wget https://github.com/alvations/pywsd/archive/master.zip
    $ unzip master.zip
    $ cd pywsd-master
    $ python
    Python 2.7.5+ (default, Feb 27 2014, 19:37:08) 
    [GCC 4.8.1] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from lesk import simple_lesk
    # disambiguating the word 'bass' given the context 'bass music'
    >>> simple_lesk('bass music', 'bass') 
    Synset('bass.n.07')
    >>> disambiguated = simple_lesk('bass music', 'bass')
    >>> disambiguated.definition
    <bound method Synset.definition of Synset('bass.n.07')>
    >>> disambiguated.definition()
    u'the member with the lowest range of a family of musical instruments
    

    Alternatively, you can use a new module in NLTK (https://github.com/nltk/nltk/blob/develop/nltk/wsd.py), given that you have the bleeding edge version:

    from nltk.wsd import lesk
    disambiguated = lesk(context_sentence="bass music", ambiguous_word="bass")
    print disambiguated.definition()
    

    (Disclaimer: I wrote both pywsd and the lesk module in NLTK)