For example:
Ultimate Swirly Ice Cream Scoopers are usually overrated when one considers all of the scoopers one could buy.
Here I'd like to pluck:
I have the following functions for subject
, object
, and adverbial clause
:
def get_subj(decomp):
for token in decomp:
if ("subj" in token.dep_):
subtree = list(token.subtree)
start = subtree[0].i
end = subtree[-1].i + 1
return str(decomp[start:end])
def get_obj(decomp):
for token in decomp:
if ("dobj" in token.dep_ or "pobr" in token.dep_):
subtree = list(token.subtree)
start = subtree[0].i
end = subtree[-1].i + 1
return str(decomp[start:end])
def get_advcl(decomp):
for token in decomp:
# print(f"pos: {token.pos_}; lemma: {token.lemma_}; dep: {token.dep_}")
if ("advcl" in token.dep_):
subtree = list(token.subtree)
start = subtree[0].i
end = subtree[-1].i + 1
return str(decomp[start:end])
phrase = "Ultimate Swirly Ice Cream Scoopers are usually overrated when one considers all of the scoopers one could buy."
nlp = spacy.load("en_core_web_sm")
decomp = nlp(phrase)
subj = get_subj(decomp)
obj = get_obj(decomp)
advcl = get_advcl(decomp)
print("subj: ", subj)
print("obj: ", obj)
print("advcl: ", advcl)
Output:
subj: Ultimate Swirly Ice Cream Scoopers
obj: all of the scoopers
advcl: when one considers all of the scoopers one could buy
However, the actual depenency
type .dep_
for the final word of the VP, "are usually overrated", is "ROOT".
So, the subtree technique fails, as the subtree of ROOT
returns the entire sentence.
You are wanting to construct something more like a “verb group” where you keep with the root verb only certain close dependents like aux
, cop
, and advmod
but not ones like nsubj
, obj
, or advcl
.