I am using spacy to get the noun phrases of a text. What I want to do is locate those noun phrases in the text with respect to the token index of the words.
For instance
import spacy
# Load English
nlp = spacy.load("en_core_web_sm")
doc = nlp("The blue car is nicer than the white car"
noun_chunks = list(doc.noun_chunks)
for i,noun_chunk in enumerate(noun_chunks):
for j,token in enumerate(noun_chunk):
print(i,noun_chunk,j,token.text)
The value j is an index of the token.text within the span of the noun chunk, but I want to get the token.i number of the first and last word of the noun_chunk
In the example the two noun chunks are: "the red car" and "the white car"
the desired output would be:
tokens: The 1 blue 2 car 3 is 4 nicer 5 than 6 the 7 white 8 car 9
noun chunk 1: "the blue car"; starts 1, ends 3
noun chunk 2: "the white car"; starts 7, ends 9
with the start and end of a noun chunk I will be able to identify the span of the noun chunk in the doc
Thanks
I did not know about the start and end method of a chunk
chunk.start gives you the start token number of the chunk span chunk.end gives you the end token number of the chunk span