Search code examples
pythonscikit-learnsentiment-analysismulticlass-classification

Sentiment Analysis with 3 classes (positive, neutral, and negative)?


I want to do sentiment analysis with 3 classes (positive, neutral, and negative). I have seen lots of work on sentiment analysis with two classes (positive and negative), but much less so for 3 classes. If I wanted to use a bag-of-words approach and a classifier such as Logistic Regression or SVMs in Scikit-learn, how would this work? What would the steps be for my output to predict with 3 classes?

Do I have to treat each class as a binary classification and do something to combine the results, or is sklearn able to do some processing for me so I do not have to specify this?


Solution

  • There are three possible approaches:

    1. Use multiclass algorithms, such as logistic regression or decision tree (they are inherently multiclass) or one-vs-one or one-vs-rest wrappers for binary algorithms such as SVM.
    2. If you want to exploit the fact that neutral texts are "somewhere between" positive and negative ones, you can use ordered classification models, such as ordered logistic regression in the mord package.
    3. If you want to exploit the ordering of classes, but want to stay within scikit-learn, I would suggest to fit any regression model to your data first (e.g. gradient boosing regressor), and then use logistic regression on top of its prediction.