Search code examples
python-3.xtypeerrorgaussiannaivebayessklearn-pandas

When trying to perform GaussianNB on data get TypeError - python beginner


i'm trying to build a prediction model using GaussianNB.

I have a csv file that looks like this: csv data

My code looks like as follows:

encoded_df = pd.read_csv('path to file')

y = encoded_df.iloc[:,12]

X = encoded_df.iloc[:,0:12]
model = GaussianNB()
model.fit(X, y)

prediction_test_naive = ['427750', '426259', '2', '1610', '2', '1', '2', '1', '4', '1', '47', '2']

naive_predicted_class = model.predict(np.reshape(prediction_test_naive, [1, -1]))

print("predicted Casualty Severity: 1 = slight, 2 = serious, 3 = fatal: ", naive_predicted_class)

expected_bayes = y
predicted_bayes = model.predict(X)

classification_report_bayes = metrics.classification_report(expected_bayes, predicted_bayes)

print(classification_report_bayes)

When ran i get the type error:

TypeError: ufunc 'subtract' did not contain a loop with signature matching types dtype('U32') dtype('U32') dtype('U32')

The error appears to be coming from line 7 in the example code above. but other than that i do not know.

i'm not really sure how to fix this, I have a decision tree that works but would like to use bayes theorem too.


Solution

  • The error is due to this line:

    prediction_test_naive = ['427750', '426259', '2', '1610', '2', '1', '2', '1', '4', '1', '47', '2']
    

    Here you are declaring a list of strings (by using a single inverted commas around the values) which is then used to prediction. But in the model, only numerical values are allowed. So you need to convert them to numerical.

    For this you can use the following ways:

    1) Declare the prediction_test_naive as numbers like this (Notice that inverted commas have been removed):

    prediction_test_naive = [427750, 426259, 2, 1610, 2, 1, 2, 1, 4, 1, 47, 2]
    

    2) Convert the prediction_test_naive to numerical using numpy

    After this line:

    prediction_test_naive = ['427750', '426259', '2', '1610', '2', '1', '2', '1', '4', '1', '47', '2']
    

    Do this:

    prediction_test_naive = np.array(prediction_test_naive, dtype=float)