Important Edit
As informed by @Pengin in comments. NLTK is supporting WordNet 3.1 from January 2022. Thus this question is deemed irrelevant now.
I need to use Wordnet 3.1 for my research work, but NLTK (python) ships with the default wordnet version: 3.0. It is important that I use the latest version of Wordnet.
>>> from nltk.corpus import wordnet
>>> wordnet.get_version()
'3.0'
But, since NLTK 3.1 is the latest version, and I cannot find any way to download and access it using nltk.download()
, I am searching for a workaround.
As written in Wordnet Website (current version link here), I am quoting below:
WordNet 3.1 DATABASE FILES ONLY
You can download the WordNet 3.1 database files. Note that this is not a full package as those above, nor does it contain any code for running WordNet. However, you can replace the files in the database directory of your 3.0 local installation with these files and the WordNet interface will run, returning entries from the 3.1 database. This is simply a compressed tar file of the WordNet 3.1 database files.
I tried downloading the Wordnet 3.1 database files and replaced them with the default Wordnet files at C:\Users\<username>\AppData\Roaming\nltk_data\corpora
(on Windows system). I doubted that it won't work as the instructions are to replace the database file in the Wordnet software installation, but still, I tried.
On running wordnet.get_version()
, I am getting the following error.
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-2-d64ae1e68b36> in <module>
----> 1 wordnet.get_version()
~\anaconda3\lib\site-packages\nltk\corpus\util.py in __getattr__(self, attr)
118 raise AttributeError("LazyCorpusLoader object has no attribute '__bases__'")
119
--> 120 self.__load()
121 # This looks circular, but its not, since __load() changes our
122 # __class__ to something new:
~\anaconda3\lib\site-packages\nltk\corpus\util.py in __load(self)
86
87 # Load the corpus.
---> 88 corpus = self.__reader_cls(root, *self.__args, **self.__kwargs)
89
90 # This is where the magic happens! Transform ourselves into
~\anaconda3\lib\site-packages\nltk\corpus\reader\wordnet.py in __init__(self, root, omw_reader)
1136
1137 # Load the lexnames
-> 1138 for i, line in enumerate(self.open("lexnames")):
1139 index, lexname, _ = line.split()
1140 assert int(index) == i
~\anaconda3\lib\site-packages\nltk\corpus\reader\api.py in open(self, file)
206 """
207 encoding = self.encoding(file)
--> 208 stream = self._root.join(file).open(encoding)
209 return stream
210
~\anaconda3\lib\site-packages\nltk\data.py in join(self, fileid)
335 def join(self, fileid):
336 _path = os.path.join(self._path, fileid)
--> 337 return FileSystemPathPointer(_path)
338
339 def __repr__(self):
~\anaconda3\lib\site-packages\nltk\compat.py in _decorator(*args, **kwargs)
39 def _decorator(*args, **kwargs):
40 args = (args[0], add_py3_data(args[1])) + args[2:]
---> 41 return init_func(*args, **kwargs)
42
43 return wraps(init_func)(_decorator)
~\anaconda3\lib\site-packages\nltk\data.py in __init__(self, _path)
313 _path = os.path.abspath(_path)
314 if not os.path.exists(_path):
--> 315 raise IOError("No such file or directory: %r" % _path)
316 self._path = _path
317
OSError: No such file or directory: 'C:\\Users\\Punit Singh\\AppData\\Roaming\\nltk_data\\corpora\\wordnet\\lexnames'
Then I checked for the file structure and I am listing the before and after tree below.
File Tree In Wordnet 3.0
wordnet
├── adj.exc
├── adv.exc
├── citation.bib
├── cntlist.rev
├── data.adj
├── data.adv
├── data.noun
├── data.verb
├── index.adj
├── index.adv
├── index.noun
├── index.sense
├── index.verb
├── lexnames
├── LICENSE
├── noun.exc
├── README
├── verb.exc
File Tree In Wordnet 3.1
wordnet
├── adj.exc
├── adv.exc
├── cntlist
├── cntlist.rev
├── cousin.exc
├── data.adj
├── data.adv
├── data.noun
├── data.verb
├── index.adj
├── index.adv
├── index.noun
├── index.sense
├── index.verb
├── log.grind.3.1
├── noun.exc
├── sentidx.vrb
├── dbfiles
├── adj.all
├── adj.pert
├── adj.ppl
├── adv.all
├── cntlist
├── noun.act
├── noun.animal
├── noun.artifact
├── noun.attribute
├── noun.body
├── noun.cognition
├── noun.communication
├── noun.event
├── noun.feeling
├── noun.food
├── noun.group
├── noun.location
├── noun.motive
├── noun.object
├── noun.person
├── noun.phenomenon
├── noun.plant
├── noun.possession
├── noun.process
├── noun.quantity
├── noun.relation
├── noun.shape
├── noun.state
├── noun.substance
├── noun.time
├── noun.Tops
├── verb.body
├── verb.change
├── verb.cognition
├── verb.communication
├── verb.competition
├── verb.consumption
├── verb.contact
├── verb.creation
├── verb.emotion
├── verb.Framestext
├── verb.motion
├── verb.perception
├── verb.possession
├── verb.social
├── verb.stative
├── verb.weather
Any suggestions or solutions on how to use Wordnet 3.1 with NLTK (Python) will be helpful.
Thanks in advance.
After a lot of searching and trial and error, I was able to use Wordnet 3.1 on NLTK (Python). I tweaked this gist to make it work. I am providing the details below.
I divided the code provided in the gist in 3 parts.
Part 1. download_extract.py
import os
nltkdata_wn = '/path/to/nltk_data/corpora/wordnet/'
wn31 = "http://wordnetcode.princeton.edu/wn3.1.dict.tar.gz"
if not os.path.exists(nltkdata_wn+'_3.0'):
os.mkdir(nltkdata_wn+'_3.0')
os.system('mv '+nltkdata_wn+"* "+nltkdata_wn+"_3.0/")
if not os.path.exists('wn3.1.dict.tar.gz'):
os.system('wget '+wn31)
os.system("tar zxf wn3.1.dict.tar.gz -C "+nltkdata_wn)
os.system("mv "+nltkdata_wn+"dict/* "+nltkdata_wn)
os.rmdir(nltkdata_wn + 'dict')
This is used to back up the existing Wordnet 3.0 folder from wordnet
to wordnet_3.0
, download the Wordnet 3.1 database, and put it in folder wordnet
. Since I am on a Windows system, I did this manually.
Part 2. create_lexnames.py
import os
nltkdata_wn = '/path/to/nltk_data/corpora/wordnet/'
dbfiles = nltkdata_wn+'dbfiles'
with open(nltkdata_wn+'lexnames', 'w') as fout:
for i,j in enumerate(sorted(os.listdir(dbfiles))):
pos = j.partition('.')[0]
if pos == "noun":
syncat = 1
elif pos == "verb":
syncat = 2
elif pos == "adj":
syncat = 3
elif pos == "adv":
syncat = 4
elif j == "cntlist":
syncat = "cntlist"
fout.write("\t".join([str(i).zfill(2),j,str(syncat)])+"\n")
This creates the required lexnames
file in the wordnet
folder.
Part 3. testing_wn31.py
from nltk.corpus import wordnet as wn
nltkdata_wn = '/path/to/nltk_data/corpora/wordnet/'
# Checking generated lexnames file.
for i, line in enumerate(open(nltkdata_wn + 'lexnames','r')):
index, lexname, _ = line.split()
##print line.split(), int(index), i
assert int(index) == i
# Testing wordnet function.
print(wn.synsets('dog'))
for i in wn.all_synsets():
print(i, i.pos(), i.definition())
This tested the generated lexname
file and also tested if the wordnet functions are working fine.
Once I am done with this procedure, I ran following code in python and found that it is actually running version 3.1
>>> from nltk.corpus import wordnet
>>> wordnet.get_version()
'3.1'
A Word of Caution
Once you replace the Wordnet 3.1 database, you'll notice that if you run the following code
>>> import nltk
>>> nltk.download()
in the download dialog box, you will see that under Corpora
tab, Wordnet
will be shown as out of date
, you should not try to update it as it will either replace the wordnet to version 3.0 or break it.