I am working on a project which needs me to extract similarity between sentences. So given a sentence, I need the phrase n-gram of that sentence, which is 'a combination of the main verb and the noun phrase left and right of the verb'. Any idea how to extract this? I am given the dependency and constituency parse trees of the sentence. I am using Python.
Sample Sentence: My dog also likes eating sausage.
Constituency Parse Tree:
ROOT
(S
(NP (PRP$ My) (NN dog))
(ADVP (RB also))
(VP (VBZ likes)
(S
(VP (VBG eating)
(NP (NN sausage)))))
(. .)))
Dependency Graph:
nmod:poss(dog-2, My-1)
nsubj(likes-4, dog-2)
advmod(likes-4, also-3)
root(ROOT-0, likes-4)
xcomp(likes-4, eating-5)
dobj(eating-5, sausage-6)
Main verb : likes
Left Noun Phrase(NP) : My dog
Right Noun Phrase : sausage.
Have you tried Stanford OpenIE? Or, for that matter, any OpenIE system (Ollie / ReVerb / etc.).
Minimal usage (via Simple CoreNLP):
new Sentence("My dog also likes eating sausage.").openieTriples();
Pipeline/server usage:
create a CoreNLP pipeline, and set the annotators to tokenize,ssplit,pos,lemma,depparse,natlog,openie
. Then, the Open IE triples should be keyed on RelationTripleAnnotation.class
key of a sentence.
Try it out at corenlp.run