NLTK lemmatization wrong result

I've use NLTK and got wrong result like this:

>>> print lmtzr.lemmatize('coding', 'v')
cod

I consider the answer is "code" instead of a fish. Is there anyway to solve this or other python Lib can do better job?

Solution

One way to fix this is to add the word 'coding' to wordnet._exception_map:

import nltk.stem as stem
import nltk.corpus as corpus
wordnet = corpus.wordnet
wordnet._exception_map['v']['coding'] = ['code']
wnl = stem.WordNetLemmatizer()   

print(wnl.lemmatize('coding', 'v'))
# code

Note that attributes which start with a single underscore are considered private -- i.e. they are not part of the public interface. So modifying wordnet._exception_map as above is not guaranteed to work in future versions of nltk. (The above works with NLTK version 3.0.0. It was found by looking at the source code for WordNetLemmatizer.lemmatize and wordnet._morphy.)

Another way to fix the problem is to modify nltk_data/corpora/wordnet/verb.exc. The contents of the file looks like:

cockneyfied cockneyfy
codded cod
codding cod
codified codify
cogged cog
cogging cog

if you add

coding code

then this exception is added to wordnet._exception_map automatically for you.

The third option, less hacky then the previous two, is to convince the developers of Wordnet to add coding code to nltk_data/copora/wordnet/verb.exc.