Search code examples
pythonnlpspacypos-taggerdependency-parsing

How to get indices of words in a Spacy dependency parse?


I am trying to use Spacy to extract word relations/dependencies, but am a little unsure about how to use the information it gives me. I understand how to generate the visual dependency tree for debugging.

Specifically, I don’t see a way to map the list of children of a token to a specific token. There is no index—just a list of words.

Looking at the example here: https://spacy.io/usage/linguistic-features#dependency-parse

nlp("Autonomous cars shift insurance liability toward manufacturers")

Also, if the sentence were nlp("Autonomous cars shift insurance liability toward manufacturers of cars”), how would I disambiguate between the two instances of cars?

The only thing I can think of is that maybe these tokens are actually reference types that I can map to indices myself. Is that the case?

Basically, I am looking to start with getting the predicates and args to understand “who did what to whom and how/using what”.


Solution

  • Yeah, when you print a token it looks like a string. It’s not. It’s an object with tons of metadata, including token.i which is the index you are looking for.

    If you’re just getting started with spaCy, the best use of your time is the course, it’s quick and practical.