I've use NLTK and got wrong result like this:
>>> print lmtzr.lemmatize('coding', 'v')
cod
I consider the answer is "code" instead of a fish. Is there anyway to solve this or other python Lib can do better job?
One way to fix this is to add the word 'coding'
to wordnet._exception_map
:
import nltk.stem as stem
import nltk.corpus as corpus
wordnet = corpus.wordnet
wordnet._exception_map['v']['coding'] = ['code']
wnl = stem.WordNetLemmatizer()
print(wnl.lemmatize('coding', 'v'))
# code
Note that attributes which start with a single underscore are considered private -- i.e. they are not part of the public interface. So modifying wordnet._exception_map
as above is not guaranteed to work in future versions of nltk. (The above works with NLTK version 3.0.0. It was found by looking at the source code for WordNetLemmatizer.lemmatize
and wordnet._morphy
.)
Another way to fix the problem is to modify nltk_data/corpora/wordnet/verb.exc
. The contents of the file looks like:
cockneyfied cockneyfy
codded cod
codding cod
codified codify
cogged cog
cogging cog
if you add
coding code
then this exception is added to wordnet._exception_map
automatically for you.
The third option, less hacky then the previous two, is to convince the developers of Wordnet to add coding code
to nltk_data/copora/wordnet/verb.exc
.