I’m interested in training a new sentiment model with my own dataset. I know that I need to create a file with sentiment labeled for sentences and their component phrases and words.
I figured out how to create a tree like the following for the sentence “I do not love you.” via the BuildBinarizedDataset:
(1 (1 I) (1 (1 (1 (1 do) (1 not)) (1 (1 love) (1 you))) (1 .)))
However, this seems terribly difficult to add labels manually in this format, particularly for phrases within a longer sentence. It would be far easier if I could generate the following for labeling purposes, then convert when I am ready to train the new model.
sentiment_score pline1
sentiment_score phrase1
sentiment_score phrase2
...........................
sentiment_score phraseN
BLANK ROW
sentiment_score pline2
The problem is that I can’t figure out how to generate this from a sentence with the parser. If someone could provide guidance, or direct me to documentation that will explain this process, it would help me tremendously.
Here is some sample code I wrote to go through a tree and print out every subtree. So to get the print out you want just use the printSubTrees method I wrote and have it print out everything in your sentiment tree.
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.ling.Word;
import edu.stanford.nlp.parser.lexparser.LexicalizedParser;
import edu.stanford.nlp.parser.lexparser.TreeBinarizer;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.trees.*;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Properties;
public class SubTreesExample {
public static void printSubTrees(Tree inputTree) {
ArrayList<Word> words = new ArrayList<Word>();
for (Tree leaf : inputTree.getLeaves()) {
words.addAll(leaf.yieldWords());
}
System.out.print(inputTree.label()+"\t");
for (Word w : words) {
System.out.print(w.word()+ " ");
}
System.out.println();
for (Tree subTree : inputTree.children()) {
printSubTrees(subTree);
}
}
public static void main(String[] args) {
Properties props = new Properties();
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
String text = "I do not love you.";
Annotation annotation = new Annotation(text);
pipeline.annotate(annotation);
Tree sentenceTree = annotation.get(CoreAnnotations.SentencesAnnotation.class).get(0).get(
TreeCoreAnnotations.TreeAnnotation.class);
printSubTrees(sentenceTree);
}
}