Search code examples
predictiongoogle-prediction

Sentiment analysis Google Prediction API


I am reading about the Google Prediction API and can't figure out a part of the docs.

From the use cases I am stuck a bit on this part:

Each line can only have one label assigned, but you can apply multiple labels to one example by repeating an example and applying different labels to each one. For example: "excited", "OMG! Just had a fabulous day!" "annoying", "OMG! Just had a fabulous day!" If you send a tweet to this model, you might get a classification something like this: "excited":0.6, "annoying":0.2.

Why would it put "excited":0.6, "annoying":0.2 while there are no more features on excited. Why is excited prefered?


Solution

  • It's not that the tag "excited" is preferred, but a probability that the message should in fact be classified as "excited" and not "annoyed."

    Suppose I have 2 classifications for sentiment: "bullish" and "bearish." I then train a model in the Prediction API with even amounts of "bullish" and "bearish" training data. When I submit a message to Prediction API to get the sentiment, it reads the text and assigns a probability both a "bullish" and a "bearish" probability based on the words in the message. The sum of the probabilities will add up to 1.

    So again, it's not that one label is preferred to another, but the probability of the message being "excited" is 3 times greater than it being "annoyed."