Search code examples
pythongraphnlpstanford-nlp

Extracting phrase n-grams from a sentence corresponding to the main verb


I am working on a project which needs me to extract similarity between sentences. So given a sentence, I need the phrase n-gram of that sentence, which is 'a combination of the main verb and the noun phrase left and right of the verb'. Any idea how to extract this? I am given the dependency and constituency parse trees of the sentence. I am using Python.

Sample Sentence: My dog also likes eating sausage.
Constituency Parse Tree:
ROOT
(S
    (NP (PRP$ My) (NN dog))
    (ADVP (RB also))
    (VP (VBZ likes)
      (S
        (VP (VBG eating)
          (NP (NN sausage)))))
    (. .)))

Dependency Graph:
nmod:poss(dog-2, My-1)
nsubj(likes-4, dog-2)
advmod(likes-4, also-3)
root(ROOT-0, likes-4)
xcomp(likes-4, eating-5)
dobj(eating-5, sausage-6)

Main verb : likes

Left Noun Phrase(NP) : My dog

Right Noun Phrase : sausage.


Solution

  • Have you tried Stanford OpenIE? Or, for that matter, any OpenIE system (Ollie / ReVerb / etc.).

    Minimal usage (via Simple CoreNLP):

    new Sentence("My dog also likes eating sausage.").openieTriples();

    Pipeline/server usage:

    create a CoreNLP pipeline, and set the annotators to tokenize,ssplit,pos,lemma,depparse,natlog,openie. Then, the Open IE triples should be keyed on RelationTripleAnnotation.class key of a sentence.

    Try it out at corenlp.run