Search code examples
pythonnltkwordnet

Manually install Open Multilingual Worldnet (NLTK)


I am working with a computer that can only access to a private network and it cannot send instrunctions from command line. So, whenever I have to install Python packages, I must do it manually (I can't even use Pypi). Luckily, the NLTK allows my to manually download corpora (from here) and to "install" them by putting them in the proper folder (as explained here).

Now, I need to do exactly what is said in this answer:


>>> cane_lemmas = wn.lemmas("cane", lang="ita")
>>> print(cane_lemmas) 

[Lemma('dog.n.01.cane'), Lemma('cramp.n.02.cane'), Lemma('hammer.n.01.cane'), Lemma('bad_person.n.01.cane'), Lemma('incompetent.n.01.cane')]

And to do so, I thought it would be enough to download the file "52. Open Multilingual Wordnet", unzip it in C:\nltk_data\corpora and to run the previously mentioned code after importing

from nltk.corpus import wordnet as wn

However, when I run the code:


>>> cane_lemmas = wn.lemmas("cane", lang="ita")
>>> print(cane_lemmas) 

I get this error:

WordNetError: line 'es; it won brilliant victories over British frigates during the War of 1812 and is without doubt the most famous ship in the history of the United States Navy; it has been rebuilt and is anchored in the Charlestown Navy Yard in Boston \n': not enough values to unpack (expected 2, got 1)

However, if I run:

>>> cane_lemmas = wn.lemmas("dog", lang="eng")
>>> print(cane_lemmas) 

I correctly get:

[Lemma('dog.n.01.dog'), Lemma('frump.n.01.dog'), Lemma('dog.n.03.dog'), Lemma('cad.n.01.dog'), Lemma('frank.n.02.dog'), Lemma('pawl.n.01.dog'), Lemma('andiron.n.01.dog'), Lemma('chase.v.01.dog')]

What am I doing wrong?

I am using python 3.7.4 and and nltk 3.4.5


Solution

  • To be certain, can you verify your current nltk_data folder structure? The correct structure is:

    nltk_data
    + corpora
      + wordnet
        + adj.exc
        + adv.exc
        + ...
      + omw
        + ...
        + ita
          + citation.bib
          + LICENSE
          + ...
        + ...
    

    However, in most situations where the issue is due to an incorrect nltk_data install, NLTK will notify you that there was an issue with the install (and that you must perform e.g. nltk.download("wordnet") to resolve it)

    I believe that in order to do what you're suggesting, you must have both wordnet and omw downloaded:

    Keep in mind that NLTK now supports 2 versions of OMW (there is also omw-1.4., but support for this was only added in NLTK 3.6.7). Furthermore, there are 3 versions of wordnet: wordnet, wordnet2021, wordnet31 and wordnet_ic. However, I believe you should be okay with sticking just with omw and wordnet.

    See https://www.nltk.org/nltk_data/ for some more information on the nltk_data packages.