I am implementing few string replacers, with these conversions in mind
'thou sittest' → 'you sit'
'thou walkest' → 'you walk'
'thou liest' → 'you lie'
'thou risest' → 'you rise'
If I keep it naive it is possible to use regex for this case to find & replace, like thou [a-z]+est
But the trouble comes in English verbs that end with e
because based on the context I need to trim the est
in some & trim just st
in the rest
What is the quick-dirty solution to achieve this?
Probably the most quick and dirty:
import nltk
words = set(nltk.corpus.words.words())
for old in 'sittest walkest liest risest'.split():
new = old[:-2]
while new and new not in words:
new = new[:-1]
print(old, new)
Output:
sittest sit
walkest walk
liest lie
risest rise
UPDATE. A slightly less quick and dirty (works e.g. for rotest
→ verb rot
, not noun rote
):
from nltk.corpus import wordnet as wn
for old in 'sittest walkest liest risest rotest'.split():
new = old[:-2]
while new and not wn.synsets(new, pos='v'):
new = new[:-1]
print(old, new)
Output:
sittest sit
walkest walk
liest lie
risest rise
rotest rot