Search code examples
pythonnltkwordnet

Why does NLTK WordNet fail finding simple words?


I'd like to write a simple function to see if this word 'exists' in WordNet via NLTK.

def is_known(word):
    """return True if this word "exists" in WordNet
       (or at least in nltk.corpus.stopwords)."""
    if word.lower() in nltk.corpus.stopwords.words('english'):
        return True
    synset = wn.synsets(word)
    if len(synset) == 0:
        return False
    else:
        return True

Why would words like could, since, without, although return False? Don't they appear in WordNet? Is there any better way to find out whether a word exists in WN (using NLTK)?

My first try was to eliminate "stopwords" which are words like to, if, when, then, I, you, but there are still very common words (like could) which I can't find.


Solution

  • WordNet does not contain these words or words like them. For an explanation, see the following from the WordNet docs:

    Q. Why is WordNet missing: of, an, the, and, about, above, because, etc.
    A. WordNet only contains "open-class words": nouns, verbs, adjectives, and adverbs. Thus, excluded words include determiners, prepositions, pronouns, conjunctions, and particles.
    

    You also won't find these kinds of words in the online version of WordNet.