Search code examples
pythonnlpnltklemmatization

How to inverse lemmatization process given a lemma and a token?


Generally, in natural language processing, we want to get the lemma of a token.

For example, we can map 'eaten' to 'eat' using wordnet lemmatization.

Is there any tools in python that can inverse lemma to a certain form?

For example, we map 'go' to 'gone' given target form 'eaten'.

PS: Someone mentions we have to store such mappings. How to un-stem a word in Python?


Solution

  • Turning a base form such as a lemma into a situation-appropriate form is called realization (or "surface realization"). Example from Wikipedia:

    NPPhraseSpec subject = nlgFactory.createNounPhrase("the", "woman");
    subject.setPlural(true);
    SPhraseSpec sentence = nlgFactory.createClause(subject, "smoke");
    sentence.setFeature(Feature.NEGATED, true);
    System.out.println(realiser.realiseSentence(sentence));
    // output: "The women do not smoke."
    

    Libraries for this are not as frequently used as lemmatizers, which generally means you have fewer options and are less likely to find a well developed library. The Wikipedia example is in Java because the most popular library supporting this is SimpleNLG.

    A quick search found pynlg, though it doesn't seem actively maintained. Alternately you can use SimpleNLG via an HTTP JSON interface provided by the Python library nlgserv.