Say that I have a word A and a word B, where I use B as a hint which implies the meaning of A. For instance, A = bass, B = music, given this word pair, as human beings we can immediately know what does the word A mean.
I know that there are lots of algorithms that work for sentences. I'm wondering if there has been algorithms developed for doing WSD only for a pair of words.
Word Sense Disambiguation (WSD) is the task in disambiguating a word given a context sentence/document. In the case, of a two token phrase, the context is basically the other token.
You can try out different WSD software and here's a list: Anyone know of some good Word Sense Disambiguation software?
I'll give you an example using pywsd
(https://github.com/alvations/pywsd):
$ wget https://github.com/alvations/pywsd/archive/master.zip
$ unzip master.zip
$ cd pywsd-master
$ python
Python 2.7.5+ (default, Feb 27 2014, 19:37:08)
[GCC 4.8.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from lesk import simple_lesk
# disambiguating the word 'bass' given the context 'bass music'
>>> simple_lesk('bass music', 'bass')
Synset('bass.n.07')
>>> disambiguated = simple_lesk('bass music', 'bass')
>>> disambiguated.definition
<bound method Synset.definition of Synset('bass.n.07')>
>>> disambiguated.definition()
u'the member with the lowest range of a family of musical instruments
Alternatively, you can use a new module in NLTK
(https://github.com/nltk/nltk/blob/develop/nltk/wsd.py), given that you have the bleeding edge version:
from nltk.wsd import lesk
disambiguated = lesk(context_sentence="bass music", ambiguous_word="bass")
print disambiguated.definition()
(Disclaimer: I wrote both pywsd
and the lesk
module in NLTK
)