If I take the example from the homepage:
The strongest rain ever recorded in India shut down
the financial hub of Mumbai, snapped communication
lines, closed airports and forced thousands of people
to sleep in their offices or walk home during the night,
officials said today.
The Stanford parser:
LexicalizedParser lexicalizedParser = LexicalizedParser.loadModel("edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz");
Tree parse = lexicalizedParser.parse(text);
TreePrint treePrint = new TreePrint("penn, typedDependencies");
treePrint.printTree(parse);
Delivers the follwing tree:
(ROOT
(S
(S
(NP
(NP (DT The) (JJS strongest) (NN rain))
(VP
(ADVP (RB ever))
(VBN recorded)
(PP (IN in)
(NP (NNP India)))))
(VP
(VP (VBD shut)
(PRT (RP down))
(NP
(NP (DT the) (JJ financial) (NN hub))
(PP (IN of)
(NP (NNP Mumbai)))))
(, ,)
(VP (VBD snapped)
(NP (NN communication) (NNS lines)))
(, ,)
(VP (VBD closed)
(NP (NNS airports)))
(CC and)
(VP (VBD forced)
(NP
(NP (NNS thousands))
(PP (IN of)
(NP (NNS people))))
(S
(VP (TO to)
(VP
(VP (VB sleep)
(PP (IN in)
(NP (PRP$ their) (NNS offices))))
(CC or)
(VP (VB walk)
(NP (NN home))
(PP (IN during)
(NP (DT the) (NN night))))))))))
(, ,)
(NP (NNS officials))
(VP (VBD said)
(NP-TMP (NN today)))
(. .)))
I now want to splitt the Tree dependent to its structure to get the clauses. So in this example i want to splitt the tree to get the following parts:
So the first answer was to use an recursive algorithm to print all root to leaf pathes.
Here is the code i tried:
public static void main(String[] args) throws IOException {
LexicalizedParser lexicalizedParser = LexicalizedParser.loadModel("edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz");
Tree tree = lexicalizedParser.parse("In a ceremony that was conspicuously short on pomp and circumstance at a time of austerity, Felipe, 46, took over from his father, King Juan Carlos, 76.");
printAllRootToLeafPaths(tree, new ArrayList<String>());
}
private static void printAllRootToLeafPaths(Tree tree, List<String> path) {
if(tree != null) {
if(tree.isLeaf()) {
path.add(tree.nodeString());
}
if(tree.children().length == 0) {
System.out.println(path);
} else {
for(Tree child : tree.children()) {
printAllRootToLeafPaths(child, path);
}
}
path.remove(tree.nodeString());
}
}
Ofcourse this code is totally unlogical because if i just add the leafs to the paths there will never be the recursive call cause leafs have no children. The problem here is, all real words are leafs and so this algorithm will just print out single words which are leafs:
[The]
[strongest]
[rain]
[ever]
[recorded]
[in]
[India]
[shut]
[down]
[the]
[financial]
[hub]
[of]
[Mumbai]
[,]
[snapped]
[communication]
[lines]
[,]
[closed]
[airports]
[and]
[forced]
[thousands]
[of]
[people]
[to]
[sleep]
[in]
[their]
[offices]
[or]
[walk]
[home]
[during]
[the]
[night]
[,]
[officials]
[said]
[today]
[.]
Take a look at print all root to leaf paths in a binary tree or splitting a binary tree: