I'm very new to using spaCy. I have been reading the documentation for hours and I'm still confused if it's possible to do what I have in my question. Anyway...
As the title says, is there a way to actually get a given noun chunk using a token containing it. For example, given the sentence:
"Autonomous cars shift insurance liability toward manufacturers"
Would it be possible to get the "autonomous cars"
noun chunk when what I only have the "cars"
token? Here is an example snippet of the scenario that I'm trying to go for.
startingSentence = "Autonomous cars and magic wands shift insurance liability toward manufacturers"
doc = nlp(startingSentence)
noun_chunks = doc.noun_chunks
for token in doc:
if token.dep_ == "dobj":
print(child) # this will print "liability"
# Is it possible to do anything from here to actually get the "insurance liability" token?
Any help will be greatly appreciated. Thanks!
You can easily find the noun chunk that contains the token you've identified by checking if the token is in one of the noun chunk spans:
doc = nlp("Autonomous cars and magic wands shift insurance liability toward manufacturers")
interesting_token = doc[7] # or however you identify the token you want
for noun_chunk in doc.noun_chunks:
if interesting_token in noun_chunk:
print(noun_chunk)
The output is not correct with en_core_web_sm and spacy 2.0.18 because shift
isn't identified as a verb, so you get:
magic wands shift insurance liability
With en_core_web_md, it's correct:
insurance liability
(It makes sense to include examples with real ambiguities in the documentation because that's a realistic scenario (https://spacy.io/usage/linguistic-features#noun-chunks), but it's confusing for new users if they're ambiguous enough that the analysis is unstable across versions/models.)