How do you find the aggregated sentiment of multiple sentences/a paragraph/large passage of text.
I have the following code below which I have based on the github Stanford CoreNLP tests and various examples, but everything I've found has completed sentiment analysis only computes the sentiment for individual sentences. But I want the overall tweet's sentiment regardless of how many sentences are in it.
The only other way I can think of doing this is creating a separate thread for a SentimentPipeline.main(String[])
and feeding the text to stdin
and collecting the overall sentiment in sdout
. I would prefer just being able to use my code to make it simpler/more efficient, but I haven't found anything.
Also, I don't want to do a system call to a jar like most people do as I will be doing millions of tweets per day. The overhead would be too great loading the resources each time.
Annotation document = new Annotation(text);
pipeline.annotate(document);
List<CoreMap> sentences = document.get(SentencesAnnotation.class);
String output;
for (CoreMap sentence : sentences) {
// traversing the words in the current sentence a CoreLabel is a CoreMap with additional token-specific methods
output = "";
for (CoreLabel token : sentence.get(TokensAnnotation.class)) {
// this is the text of the token
String word = token.get(TextAnnotation.class);
// this is the Parts Of Speech tag of the token (noun, verb, adjective etc)
// String pos = token.get(PartOfSpeechAnnotation.class);
// this is the NER label of the token
String ne = token.get(NamedEntityTagAnnotation.class);
if (!ne.contentEquals("O")) {
output = output + (ne + " " + word + " ");
}
}
//**************Sentiment Analysis
Tree tree = sentence.get(SentimentCoreAnnotations.AnnotatedTree.class);
String sentiment = RNNCoreAnnotations.getPredictedClass(tree);
The sentiment analysis toolkit in stanford corenlp is trained on a sentence-level data set. If you need a document-level sentiment engine, I think training a new model on documents is a better choice. You can also try to process the sentences one by one, and use some tricky methods (such as average, max) as your baselines to test how it works.