Search code examples
nlpnltksentiment-analysisopennlpnaivebayes

Working of Machine learning algorithms for sentiment analysis


I found a good example of a Naive Bayes Classifier from here . I am unable to understand the steps.

from nltk.classify import SklearnClassifier
from sklearn.naive_bayes import BernoulliNB
from sklearn.svm import SVC
train_data = [({"a": 4, "b": 1, "c": 0}, "ham"),
      ({"a": 5, "b": 2, "c": 1}, "ham"),
      ({"a": 0, "b": 3, "c": 4}, "spam"),
          ({"a": 5, "b": 1, "c": 1}, "ham"),
          ({"a": 1, "b": 4, "c": 3}, "spam")]
classif = SklearnClassifier(BernoulliNB()).train(train_data)
test_data = [{"a": 3, "b": 2, "c": 1},
             {"a": 0, "b": 3, "c": 7}]
classif.classify_many(test_data)
['ham', 'spam']
classif = SklearnClassifier(SVC(), sparse=False).train(train_data)
classif.classify_many(test_data)
['ham', 'spam']

What are :

  1. Features in the code above?
  2. Actual Data for sentiment?
  3. "a": 4, "b": 1, "c": 0 ?
  4. ham, spam?

The basic purpose is to understand that how the ML Algorithm works. I am newbie in Sentiment Analysis. I hope someone will help.


Solution

  • The code sample you posted uses nonsense data to train a classifier.

    What are :

    1. Features in the code above?
    2. "a": 4, "b": 1, "c": 0 ?
    3. ham, spam?

    The array train_data contains features named "a", "b", and "c". The classification categories are "ham" and "spam". Sentiment analysis might use categories "positive" and "negative".

    1. Actual Data for sentiment?

    There are no actual sentiment data in this demo.

    Be aware that you won't learn anything about how the learning algorithm works from this snippet. It just shows you the API to a black box that trains the classifier. To learn about machine learning, read about how the training works. To learn how to train a classifier (without knowing how the training works behind the scenes), start with Chapter 6 of the NLTK book.