Search code examples
nlpnltkstanford-nlpspacy

How does one extract the verb phrase in Spacy?


For example:

Ultimate Swirly Ice Cream Scoopers are usually overrated when one considers all of the scoopers one could buy.

Here I'd like to pluck:

  • Subject: "Ultimate Swirly Ice Cream Scoopers"
  • Adverbial Clause: "When one considers all of the scoopers one could buy"
  • Verb Phrase: "are usually overrated"

I have the following functions for subject, object, and adverbial clause:

def get_subj(decomp):
    for token in decomp:
        if ("subj" in token.dep_):
            subtree = list(token.subtree)
            start = subtree[0].i
            end = subtree[-1].i + 1
            return str(decomp[start:end])

def get_obj(decomp):
    for token in decomp:
        if ("dobj" in token.dep_ or "pobr" in token.dep_):
            subtree = list(token.subtree)
            start = subtree[0].i
            end = subtree[-1].i + 1
            return str(decomp[start:end])

def get_advcl(decomp):
    for token in decomp:
        # print(f"pos: {token.pos_}; lemma: {token.lemma_}; dep: {token.dep_}")
        if ("advcl" in token.dep_):
            subtree = list(token.subtree)
            start = subtree[0].i
            end = subtree[-1].i + 1
            return str(decomp[start:end])

phrase = "Ultimate Swirly Ice Cream Scoopers are usually overrated when one considers all of the scoopers one could buy."

nlp = spacy.load("en_core_web_sm")
decomp = nlp(phrase)

subj = get_subj(decomp)
obj = get_obj(decomp)
advcl = get_advcl(decomp)

print("subj: ", subj)
print("obj: ", obj)
print("advcl: ", advcl)

Output:

subj:  Ultimate Swirly Ice Cream Scoopers
obj:  all of the scoopers
advcl:  when one considers all of the scoopers one could buy

However, the actual depenency type .dep_ for the final word of the VP, "are usually overrated", is "ROOT".

So, the subtree technique fails, as the subtree of ROOT returns the entire sentence.


Solution

  • You are wanting to construct something more like a “verb group” where you keep with the root verb only certain close dependents like aux, cop, and advmod but not ones like nsubj, obj, or advcl.