Search code examples
javaparsingnlpstanford-nlp

How to parse Penn Tree Bank and get all the child trees using stanford NLP?


Is there a way to parse the PTB tree below to get all the child trees for example:

Text   :  Today is a nice day.
PTB : (3 (2 Today) (3 (3 (2 is) (3 (2 a) (3 (3 nice) (2 day)))) (2 .)))

Need All child trees possible

Output  : 
(3 (2 Today) (3 (3 (2 is) (3 (2 a) (3 (3 nice) (2 day)))) (2 .)))
(2 Today)
(3 (3 (2 is) (3 (2 a) (3 (3 nice) (2 day)))) (2 .))
(3 (2 is) (3 (2 a) (3 (3 nice) (2 day))))
(3 (2 is) (3 (2 a) (3 (3 nice) (2 day))))
(2 is)
(3 (2 a) (3 (3 nice) (2 day)))
(2 a)
(3 (3 nice) (2 day))
(3 nice)
(2 day)
(2 .)

Solution

  • The input file for this demo should be one string representation of a tree per line. This example prints out the subtrees of the first tree.

    The Stanford CoreNLP class of interest is Tree.

    import edu.stanford.nlp.trees.*;
    
    import java.io.BufferedReader;
    import java.io.FileInputStream;
    import java.io.InputStreamReader;
    import java.io.*;
    
    public class TreeLoadExample {
    
        public static void printSubTrees(Tree t) {
            if (t.isLeaf())
                return;
            System.out.println(t);
            for (Tree subTree : t.children()) {
                printSubTrees(subTree);
            }
        }
    
    
        public static void main(String[] args) throws IOException, FileNotFoundException,
                UnsupportedEncodingException {
            TreeFactory tf = new LabeledScoredTreeFactory();
            Reader r = new BufferedReader(new InputStreamReader(new FileInputStream(args[0]), "UTF-8"));
            TreeReader tr = new PennTreeReader(r, tf);
            Tree t = tr.readTree();
            printSubTrees(t);
        }
    }