Search code examples
pythonnlpnltkspacy

What's the most convenient way to analyze a sentence phrases and structure using NLTK or SpaCy?


enter image description here

What's the most convenient way to analyze a sentence phrases and structure using NLTK or SpaCy? The main goal is to get a well organized and clean data in order to apply some inferential statistics on it.

Here is a simple example of what I need, as shown in the tree above:

  • NP which is a Noun Phrase
  • VP, a Verb Phrase
  • ADJP, Adjective Phrase
  • -, a coordinating conjunction, implying that it is a compound sentence
  • PP, a Prepositional Phrase

  • Solution

  • The most convenient way is to use dependency parsing from spacy. https://spacy.io/usage/linguistic-features#dependency-parse From its output you can extract whatever information you need. It is important to memorize that no parser will ever have perfect accuracy, so best choose a large model to guarantee good quality.