I have the task of sentence completion, I have the subj, verb, adverb or subject and all I need is the appropriate preposition in between. Is there any NLP tool that can give distribution over the prepositions that can go with the verb?
Here's how to get frequency counts for all verb-preposition pairs in the Brown corpus, and then look up the ones for the verb "go". First the counts:
import nltk
from nltk.corpus import brown
prepchoices = nltk.ConditionalFreqDist((v[0], p[0])
for (v, p) in nltk.bigrams(brown.tagged_words(tagset="universal"))
if v[1] == "VERB" and p[1] == "ADP")
"ADP" stands for "adposition", i.e. preposition or post-position. Now let's look at what we've got:
>>> prepchoices["go"]
FreqDist({'to': 96, 'with': 20, 'into': 18, 'through': 8, 'on': 8, 'for': 7,
'in': 5, 'out': 4, 'around': 4, 'from': 4, ...})
You can get the top choices, in descending order of frequency, with most_common()
>>> print(prepchoices["go"].most_common(5))
[('to', 96), ('with', 20), ('into', 18), ('through', 8), ('on', 8)]
I didn't do any stemming of the verbs ("goes" and "went" were counted as separate words), or even case-folding. You could add them, but the above should already give you a decent picture of the distribution.