I have a corpus of sentences that were preprocessed by Stanford's CoreNLP systems. One of the things it provides is the sentence's Parse Tree (Constituency-based). While I can understand a parse tree when it's drawn (like a tree), I'm not sure how to read it in this format:
(NP (NN sent28))
(: :)
(NP (NNP Rome))
(VP (VBZ is)
(PP (IN in)
(NP (NNP Lazio) (NN province))
(CC and)
(NP (NNP Naples))
(PP (IN in)
(NP (NNP Campania))))))))
(. .)))
The original sentence is:
sent28: Rome is in Lazio province and Naples in Campania .
How am I supposed to read this tree, or alternatively, is there a code (in python) that does it properly? Thanks.
has a class for reading parse trees: nltk.tree.Tree
. The relevant method is called fromstring
. You can then iterate its subtrees, leaves, etc...
As an aside: you might want to remove the bit that says sent28:
as it confuses the parser (it's also not a part of the sentence). You are not getting a full parse tree, but just a sentence fragment.