Search code examples
pythonnlpnltk

Get all leaf words for a stemmed keyword


I am looking for something like un-stemming. Is there a way to get all possible list of words which have share a common stem. Something like

>>> get_leaf_words('play')
>>> ['player', 'play', 'playing' ... ]

Solution

  • Solution to the above question: https://github.com/gutfeeling/word_forms ! Thanks to @Divyanshu Srivastava

    >>> from word_forms.word_forms import get_word_forms
    >>> get_word_forms("president")
    >>> {'n': {'presidents', 'presidentships', 'presidencies', 'presidentship', 'president', 'presidency'},
         'a': {'presidential'},
         'v': {'preside', 'presided', 'presiding', 'presides'},
         'r': {'presidentially'}}
    >>> get_word_forms("elect")
    >>> {'n': {'elects', 'electives', 'electors', 'elect', 'eligibilities', 'electorates', 'eligibility', 'elector', 'election', 'elections', 'electorate', 'elective'},
         'a': {'eligible', 'electoral', 'elective', 'elect'},
         'v': {'electing', 'elects', 'elected', 'elect'},
         'r': set()}
    


    Previous Answer:

    Reverse stemming is not possible, as most of the stemmers create the base word using some rule-set applied on the original word.

    But there is revere lemmatization which is called realization (or "surface realization").

    You can use some of the publically available lemmatization datasets/dictionaries to do that.

    Example: https://raw.githubusercontent.com/richardwilly98/elasticsearch-opennlp-auto-tagging/master/src/main/resources/models/en-lemmatizer.dict [Apache OpenNLP]

    I could not find a direct library in Python but found one in Java (pynlg)

    Furthermore: If you have enough original words, you can create a reverse dictionary for lemmatization OR stemming!