Search code examples
javajarstanford-nlpsentiment-analysis

How to get overall sentiment for multiple sentences


How do you find the aggregated sentiment of multiple sentences/a paragraph/large passage of text.

I have the following code below which I have based on the github Stanford CoreNLP tests and various examples, but everything I've found has completed sentiment analysis only computes the sentiment for individual sentences. But I want the overall tweet's sentiment regardless of how many sentences are in it.

The only other way I can think of doing this is creating a separate thread for a SentimentPipeline.main(String[]) and feeding the text to stdin and collecting the overall sentiment in sdout. I would prefer just being able to use my code to make it simpler/more efficient, but I haven't found anything.

Also, I don't want to do a system call to a jar like most people do as I will be doing millions of tweets per day. The overhead would be too great loading the resources each time.

Annotation document = new Annotation(text);
pipeline.annotate(document);

List<CoreMap> sentences = document.get(SentencesAnnotation.class);
        String output;
        for (CoreMap sentence : sentences) {
            // traversing the words in the current sentence a CoreLabel is a CoreMap with additional token-specific methods
             output = "";
            for (CoreLabel token : sentence.get(TokensAnnotation.class)) {

                // this is the text of the token
                String word = token.get(TextAnnotation.class);

                // this is the Parts Of Speech tag of the token (noun, verb, adjective etc)
                // String pos = token.get(PartOfSpeechAnnotation.class);

                // this is the NER label of the token
                String ne = token.get(NamedEntityTagAnnotation.class);
                if (!ne.contentEquals("O")) {
                    output = output + (ne + " " + word + " ");
                }
            }

            //**************Sentiment Analysis 
            Tree tree = sentence.get(SentimentCoreAnnotations.AnnotatedTree.class);
             String sentiment = RNNCoreAnnotations.getPredictedClass(tree);

Solution

  • The sentiment analysis toolkit in stanford corenlp is trained on a sentence-level data set. If you need a document-level sentiment engine, I think training a new model on documents is a better choice. You can also try to process the sentences one by one, and use some tricky methods (such as average, max) as your baselines to test how it works.