Search code examples
pythonnlpstemmingtext-processing

English verbs processing ending with 'e'


I am implementing few string replacers, with these conversions in mind

'thou sittest' → 'you sit'
'thou walkest' → 'you walk'
'thou liest' → 'you lie'
'thou risest' → 'you rise'

If I keep it naive it is possible to use regex for this case to find & replace, like thou [a-z]+est

But the trouble comes in English verbs that end with e because based on the context I need to trim the est in some & trim just st in the rest

What is the quick-dirty solution to achieve this?


Solution

  • Probably the most quick and dirty:

    import nltk
    words = set(nltk.corpus.words.words())
    for old in 'sittest walkest liest risest'.split():
        new = old[:-2]
        while new and new not in words:
            new = new[:-1]
        print(old, new)
    

    Output:

    sittest sit
    walkest walk
    liest lie
    risest rise
    

    UPDATE. A slightly less quick and dirty (works e.g. for rotest → verb rot, not noun rote):

    from nltk.corpus import wordnet as wn
    for old in 'sittest walkest liest risest rotest'.split():
        new = old[:-2]
        while new and not wn.synsets(new, pos='v'):
            new = new[:-1]
        print(old, new)
    

    Output:

    sittest sit
    walkest walk
    liest lie
    risest rise
    rotest rot