Search code examples
python-2.7pymongotextblob

Performing sentiment analysis on a mongodb collection containing JSON elements (tweets) in Python


Hi I have created a python script using tweepy to stream tweets based on a keyword array into a mongodb collection based on the name of the element in the array that it was filtered by via pymongo ie (apple tweets saved to an apple collection). This script saves them in a JSON format and now I want to perform sentiment analysis on these saved tweets.

I have been reading a few tutorials on this and have decided to use the NaiveBayesClassifier built into the TextBlob module. I have created some train data and passed it into the classifier (just a normal text array with the sentiment at the end of each element) but I am unsure of how to apply this classifier to my already saved tweets. I think its like as below but this does not work as it throws an error:

Traceback (most recent call last):
  File "C:/Users/Philip/PycharmProjects/FinalYearProject/TrainingClassification.py", line 25, in <module>
    cl = NaiveBayesClassifier(train)
  File "C:\Python27\lib\site-packages\textblob\classifiers.py", line 192, in __init__
    self.train_features = [(self.extract_features(d), c) for d, c in self.train_set]
ValueError: too many values to unpack

Here is my code so far:

from textblob.classifiers import NaiveBayesClassifier
import pymongo

train = [
    'I love this sandwich.', 'pos',
    'I feel very good about these beers.', 'pos',
    'This is my best work.', 'pos',
    'What an awesome view", 'pos',
    'I do not like this restaurant', 'neg',
    'I am tired of this stuff.', 'neg',
    'I can't deal with this', 'neg',
    'He is my sworn enemy!', 'neg',
    'My boss is horrible.', 'neg'
]

cl = NaiveBayesClassifier(train)
conn = pymongo.MongoClient('localhost', 27017)
db = conn.TwitterDB

appleSentiment = cl.classify(db.Apple)
print ("Sentiment of Tweets about Apple is " + appleSentiment)

Any help would be greatly appreciated.


Solution

  • Quoting the documentation

    classify: Classifies a string of text.

    But instead you are passing it a collection. db.Apple is a collection not a string text.

    appleSentiment = cl.classify(db.Apple)
                                  ^
    

    You need to write a query and use your query result as argument to classify For example to find any particular tweet can use find_one. For more infos the documentation is your friend.