Search code examples
pythonmachine-learningsentiment-analysisnaivebayes

sentimental analysis only for one review.. here's the code what supposed to be second argument for classifier.fit(new_X_test, )?


this is the code for sentimental analysis only for one review, as we don't have dataset i am not able to figure out what would be the second parameter for classifier.fit method in naive bayes model?

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Cleaning the code
import re   
import nltk    
nltk.download('stopwords') 
from nltk.corpus import stopwords 
from nltk.stem.porter import PorterStemmer 
new_review = 'I love this restaurant so much'
new_review = re.sub('[^a-zA-Z]', ' ', new_review)
new_review = new_review.lower()
new_review = new_review.split()
ps = PorterStemmer()
all_stopwords = stopwords.words('english')
all_stopwords.remove('not')
new_review = [ps.stem(word) for word in new_review if not word in set(all_stopwords)]
new_review = ' '.join(new_review)
new_corpus = [new_review]


#Creating the bag of word model
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(3)
new_X_test = cv.fit_transform(new_corpus).toarray()
#new_X_test = cv.transform(new_corpus).toarray()

# training in Naive bayes model

from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
classifier.fit(new_X_test, )

# predict the result
#y_pred = classifier.predict(X)
new_y_pred = classifier.predict(new_X_test)
print(new_y_pred)

#new_X_test = cv.transform(new_corpus).toarray()
#new_y_pred = classifier.predict(X)
#print(new_y_pred)

Solution

  • According to sklearn.naive_bayes.GaussianNB.fit() manual page, the second parameter is y, where:

    y: array-like of shape (n_samples,)
    Target values.

    The target value in your case is the sentiment of your unique review. Naive Bayes is a supervised classification algorithm. "Supervised" means that you have to guide the algorithm during training (or model fitting) by providing the correct target values (or labels).

    The code, as it is now, does not really make much sense. You cannot train/fit meaningfully a model with only one sample. You will need to have a dataset with many reviews to fit the model and then try to predict new samples.