Search code examples
pythonclassificationnaivebayes

Training accuracy on Naive Bayes in Python


I'm running a Naive Bayes model and can print my testing accuracy but not the training accuracy

#import libraries
from sklearn.preprocessing import StandardScaler
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn import metrics
from sklearn.decomposition import PCA

#Naive Bayes model
gNB = GaussianNB()
gNB.fit(X_train, y_train)

nb_predict = gNB.predict(X_test)

print(metrics.classification_report(y_test, nb_predict))
accuracy = metrics.accuracy_score(y_test, nb_predict)
average_accuracy = np.mean(y_test == nb_predict) * 100
print("The average_accuracy is {0:.1f}%".format(average_accuracy))

#PRINTS The average_accuracy is 39.0%

#try to print training accuracy
print(metrics.classification_report(y_train, X_train))
accuracy = metrics.accuracy_score(y_train, X_train)
average_accuracy = np.mean(y_train == X_train) * 100
print("The average_accuracy is {0:.1f}%".format(average_accuracy))

When I try to use the same code I used for the testing accuracy for the training accuracy, I get an error for the training accuracy.

We can't have more than one value on y_type => The set is no more needed

ValueError: Classification metrics can't handle a mix of multiclass and multiclass-multioutput targets

What code works?


Solution

  • sklearn.metrics.accuracy_score expects 1d array for y_true and y_pred. So, in your code below

    accuracy = metrics.accuracy_score(y_train, X_train)
    

    y_train and X_train should be 1 dimensional. But X_train is not 1d array I think. Thats why the error occurs. Read this doc: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html

    to measure the accuracy of your model for training data  after fitting your model you need to get predictions from train data Then find the accuracy:

    y_predict_for_trainData = gNB.predict(X_train)
    accuracy_For_TrainData = metrics.accuracy_score(y_train, y_predict_for_trainData)