How can I effectively build a sentiment model training dataset using Stanford CoreNLP?

I’m interested in training a new sentiment model with my own dataset. I know that I need to create a file with sentiment labeled for sentences and their component phrases and words.

I figured out how to create a tree like the following for the sentence “I do not love you.” via the BuildBinarizedDataset:

(1 (1 I) (1 (1 (1 (1 do) (1 not)) (1 (1 love) (1 you))) (1 .)))

However, this seems terribly difficult to add labels manually in this format, particularly for phrases within a longer sentence. It would be far easier if I could generate the following for labeling purposes, then convert when I am ready to train the new model.

sentiment_score pline1

sentiment_score  phrase1

sentiment_score  phrase2

...........................

sentiment_score  phraseN

BLANK ROW

sentiment_score pline2

The problem is that I can’t figure out how to generate this from a sentence with the parser. If someone could provide guidance, or direct me to documentation that will explain this process, it would help me tremendously.

Solution

Here is some sample code I wrote to go through a tree and print out every subtree. So to get the print out you want just use the printSubTrees method I wrote and have it print out everything in your sentiment tree.

import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.ling.Word;
import edu.stanford.nlp.parser.lexparser.LexicalizedParser;
import edu.stanford.nlp.parser.lexparser.TreeBinarizer;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.trees.*;

import java.io.IOException;
import java.util.ArrayList;
import java.util.Properties;

public class SubTreesExample {

    public static void printSubTrees(Tree inputTree) {
        ArrayList<Word> words = new ArrayList<Word>();
        for (Tree leaf : inputTree.getLeaves()) {
            words.addAll(leaf.yieldWords());
        }
        System.out.print(inputTree.label()+"\t");
        for (Word w : words) {
            System.out.print(w.word()+ " ");
        }
        System.out.println();
        for (Tree subTree : inputTree.children()) {
            printSubTrees(subTree);
        }
    }

    public static void main(String[] args) {
        Properties props = new Properties();
        props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse");
        StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
        String text = "I do not love you.";
        Annotation annotation = new Annotation(text);
        pipeline.annotate(annotation);
        Tree sentenceTree = annotation.get(CoreAnnotations.SentencesAnnotation.class).get(0).get(
                TreeCoreAnnotations.TreeAnnotation.class);
        printSubTrees(sentenceTree);

    }
}