Search code examples
nltkwordnetpart-of-speech

Convert words between part of speech, when wordnet doesn't do it


There are a lot of Q&A about part-of-speech conversion, and they pretty much all point to WordNet derivationally_related_forms() (For example, Convert words between verb/noun/adjective forms)

However, I'm finding that the WordNet data on this has important gaps. For example, I can find no relation at all between 'succeed', 'success', 'successful' which seem like they should be V/N/A variants on the same concept. Likewise none of the lemmatizers I've tried seem to see these as related, although I can get snowball stemmer to turn 'failure' into 'failur' which isn't really much help.

So my questions are:

  1. Are there any other (programmatic, ideally python) tools out there that do this POS-conversion, which I should check out? (The WordNet hits are masking every attempt I've made to google alternatives.)
  2. Failing that, are there ways to submit additions to WordNet despite the "due to lack of funding" situation they're presently in? (Or, can we set up a crowdfunding campaign?)
  3. Failing that, are there straightforward ways to distribute supplementary corpus to users of nltk that augments the WordNet data where needed?

Solution

  • (Asking for software/data recommendations is off-topic for StackOverflow; but I have tried to give a more general "approach" answer.)

    1. Another approach to finding related words would be one of the machine learning approaches. If you are dealing with words in isolation, look at word embeddings such as GloVe or Word2Vec. Spacy and gensim have libraries for working with them, though I'm also getting some search hits for tutorials of working with them in nltk.

    2/3. One of the (in my opinion) core reasons for the success of Princeton WordNet was the liberal license they used. That means you can branch the project, add your extra data, and redistribute.

    You might also find something useful at http://globalwordnet.org/resources/global-wordnet-grid/ Obviously most of them are not for English, but there are a few multilingual ones in there, that might be worth evaluating?

    Another approach would be to create a wrapper function. It first searches a lookup list of fixes and additions you think should be in there. If not found then it searches WordNet as normal. This allows you to add 'succeed', 'success', 'successful', and then other sets of words as end users point out something missing.