I am relatively new to the machine learning subject. I am trying to do sentiment analysis prediction.
Type column includes the sentiment of the tweet(pos, neg or neutral as 0,1 and 2). Tweet column includes the tweets.
I am trying to predict new set of tweets's sentiments as 0,1 and 2.
When I wrote the code given here I got dimension mismatch error.
import pandas as pd
train_tweets = pd.read_csv("tweets_type.csv")
from sklearn.model_selection import train_test_split
y = train_tweets.Type
X= train_tweets.Tweet
train_X, test_X, train_y, test_y = train_test_split(X, y, random_state=1)
from sklearn.feature_extraction.text import CountVectorizer
vect = CountVectorizer()
train_X_dtm = vect.transform(train_X)
test_X_dtm = vect.transform(test_X)
from sklearn.naive_bayes import MultinomialNB
nb = MultinomialNB()
%time nb.fit(train_X_dtm, train_y)
# make class predictions for X_test_dtm
y_pred_class = nb.predict(test_X_dtm)
# calculate accuracy of class predictions
from sklearn import metrics
from sklearn.metrics import classification_report, confusion_matrix
metrics.accuracy_score(test_y, y_pred_class)
march_tweets = pd.read_csv("march_data.csv")
train_new_dtm = vect.transform(X)
new_pred_class = nb.predict(train_new_dtm)
The error I am getting is here:
Would be so glad if you could help me.
It seems I made a mistake fitting X after I already fitted train_X. I found out there is no use of doing that repeatedly once you the model is fitted. So what I did is I removed this line and it worked perfectly.