I am trying to use the Spanish Wordnet from the Open Multilingual Wordnet in NLTK 3.0, but it seems that it was not downloaded with the 'omw' package. For example, with a code like the following:
from nltk.corpus import wordnet as wn
print [el.lemma_names('spa') for el in wn.synsets('bank')]
I get the following error message:
IOError: No such file or directory: u'***/nltk_data/corpora/omw/spa/wn-data-spa.tab'
According to the documentation, Spanish should be included, in the 'omw' package, but it was not downloaded with it. Do you know why this could happen?
Here's the full error traceback if a language is missing from the Open Multilingual WordNet in your nltk_data
>>> from nltk.corpus import wordnet as wn
>>> wn.synsets('bank')[0].lemma_names('spa')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/nltk/corpus/reader/wordnet.py", line 418, in lemma_names
File "/usr/local/lib/python2.7/dist-packages/nltk/corpus/reader/wordnet.py", line 1070, in _load_lang_data
f = self._omw_reader.open('{0:}/wn-data-{0:}.tab'.format(lang))
File "/usr/local/lib/python2.7/dist-packages/nltk/corpus/reader/api.py", line 198, in open
stream = self._root.join(file).open(encoding)
File "/usr/local/lib/python2.7/dist-packages/nltk/data.py", line 309, in join
return FileSystemPathPointer(_path)
File "/usr/local/lib/python2.7/dist-packages/nltk/compat.py", line 380, in _decorator
return init_func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/nltk/data.py", line 287, in __init__
raise IOError('No such file or directory: %r' % _path)
IOError: No such file or directory: u'/home/alvas/nltk_data/corpora/omw/spa/wn-data-spa.tab'
So the first thing is to check whether it's installed automatically:
>>> import nltk
>>> nltk.download('omw')
[nltk_data] Downloading package omw to /home/alvas/nltk_data...
[nltk_data] Package omw is already up-to-date!
Then you should go and check the nltk_data and find that 'spa' folder is missing:
alvas@ubi:~/nltk_data/corpora/omw$ ls
als arb cmn dan eng fas fin fra fre heb ita jpn mcr msa nor pol por README tha
So here's the short term solution:
$ wget http://compling.hss.ntu.edu.sg/omw/wns/spa.zip
$ mkdir ~/nltk_data/corpora/omw/spa
$ unzip -p spa.zip mcr/wn-data-spa.tab > ~/nltk_data/corpora/omw/spa/wn-data-spa.tab
Alternatively, you can simply copy the file from nltk_data/corpora/omw/mcr/wn-data-spa.tab
>>> from nltk.corpus import wordnet as wn
>>> wn.synsets('bank')[0].lemma_names('spa')
[u'margen', u'orilla', u'vera']
Now the lemma_names()
should work for Spanish, if you're looking for other languages from the Open Multilingusl Wordnet, you can browse here (http://compling.hss.ntu.edu.sg/omw/) and then download and put in the respective nltk_data directory.
The long term solution would be to ask the devs from NLTK and OMW project to update their datasets for their NLTK API.