A verbal noun is a noun formed from or otherwise corresponding to a verb.
I am looking to write an algorithm which when given a noun returns the corresponding verb (if the input noun is a verbal noun).
My initial thought was to apply a stemmer to the noun, then search a verb list for a verb which has the same stem.
Before doing this, I created a small test data set.
It shows that sometimes this approach will not work:
For example:
'to explain' and 'explanation' do not have the same stem.
'to decide' and 'decision' do not have the same stem.
from nltk.stem.snowball import SnowballStemmer
stemmer = SnowballStemmer('english')
l=[('to increase', 'increase'),
('to inhibit', 'inhibition'),
('to activate', 'activation'),
('to explain', 'explanation'),
('to correlate', 'correlation'),
('to decide', 'decision'),
('to insert', 'insertion')
]
for p in l:
print(stemmer.stem(p[0]), ' <-> ', stemmer.stem(p[1]))
#to increas <-> increas
#to inhibit <-> inhibit
#to activ <-> activ
#to explain <-> explan
#to correl <-> correl
#to decid <-> decis
#to insert <-> insert
Does anyone know of a method which will work in cases of derivative nouns that do not have the same stem?
There is no solution that works in all cases, since you cannot determine all cases. In English, effectively any noun can be "verbed", resulting in a sort of infinite set. What you can do is lemmatize your tokens and then use nltk's lemma.derivationally_related_forms() function in order to get all nouns that are derived from the verb. Searching the corresponding data structure will give you the right results. In order to reduce the number of verbs you have to search for for each noun, you could use something like the largest common prefix, e.g. .
look at this: