Search code examples
machine-learningsentiment-analysisnaivebayes

How to calculate accuracy of a sentiment analysis algorithm (Naive Bayes)


I'm currently working on a Naive Bayes sentiment analysis program but I'm not quite sure how to determine it's accuracy. My code is:

x = df["Text"]
y = df["Mood"]

test_size = 1785
x_train = x[:-test_size]
y_train = y[:-test_size]

x_test = x[-test_size:]
y_test = y[-test_size:]

count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(x_train)
tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
clf = MultinomialNB().fit(X_train_tfidf, y_train)

print(clf.predict(count_vect.transform(["Random text"])))

The prediction works just fine for a sentence that I give it, however I want to run it on 20% from my database (x_test and y_test) and calculate the accuracy. I'm not quite sure how to approach this. Any help would be appreciated.

I've also tried the following:

predictions = clf.predict(x_test)

print(accuracy_score(y_test, predictions))

Which gives me the following error:

ValueError: could not convert string to float: "A sentence from the dataset"

Solution

  • before usiing predictions = clf.predict(x_test) please convert the test set also to numeric

    x_test = count_vect.transform(x_test).toarray()
    

    you can find step by step to do this [here]