Search code examples
sentiment-analysistextblob

Polarity calculation in Sentiment Analysis using TextBlob


How is the polarity of a word in a sentence calculated using PatternAnalyser of Text Blob?


Solution

  • TextBlob internally uses NaiveBayes classifer for sentiment analysis, the naivebayes classifier used in turn is the one provided by NLTK.

    See Textblob sentiment analyzer code here.

    @requires_nltk_corpus
        def train(self):
            """Train the Naive Bayes classifier on the movie review corpus."""
            super(NaiveBayesAnalyzer, self).train()
            neg_ids = nltk.corpus.movie_reviews.fileids('neg')
            pos_ids = nltk.corpus.movie_reviews.fileids('pos')
            neg_feats = [(self.feature_extractor(
                nltk.corpus.movie_reviews.words(fileids=[f])), 'neg') for f in neg_ids]
            pos_feats = [(self.feature_extractor(
                nltk.corpus.movie_reviews.words(fileids=[f])), 'pos') for f in pos_ids]
            train_data = neg_feats + pos_feats
    
     #### THE CLASSIFIER USED IS NLTK's NAIVE BAYES #####
    
            self._classifier = nltk.classify.NaiveBayesClassifier.train(train_data)
    
        def analyze(self, text):
            """Return the sentiment as a named tuple of the form:
            ``Sentiment(classification, p_pos, p_neg)``
            """
            # Lazily train the classifier
            super(NaiveBayesAnalyzer, self).analyze(text)
            tokens = word_tokenize(text, include_punc=False)
            filtered = (t.lower() for t in tokens if len(t) >= 3)
            feats = self.feature_extractor(filtered)
    
            #### USE PROB_CLASSIFY method of NLTK classifer #####
    
            prob_dist = self._classifier.prob_classify(feats)
            return self.RETURN_TYPE(
                classification=prob_dist.max(),
                p_pos=prob_dist.prob('pos'),
                p_neg=prob_dist.prob("neg")
            )
    

    Source for NLTK's NaiveBayes classifier is here.. This returns probability distribution which is used for the result returned by Textblobs sentiment analyzer.

    def prob_classify(self, featureset):