Search code examples
pythonnaivebayes

Get Top 3 predicted classes from GaussianNB classifier python


I am trying to predict a class using GaussianNB, but I need to get top 3 predicted classes to create a custom score for the prediction.

My training data is x,y,class where given x and y it needs to predict the class

tests variable cointains (x,y) values and testclass contains class values.

Test is a list data set in following format
Index Type Size Value
0 tuple 2 (0.6424, 0.8325)
1 tuple 2 (0.8493, 0.7848) 
2 tuple 2 (0.791, 0.4191)

Test class data 
Index Type Size Value
0 str 1 1.274e+09
1 str 1 9.5047e+09

Code:

import csv
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.naive_bayes import GaussianNB


clf_pf = GaussianNB()
clf_pf.fit(train, trainclass)
print clf_pf.score(test,testclass)

ff = clf_pf.predict_proba(test) 

How to get the top 3 predicted classes from above variable ff?

My ff data is like below
    0           1      2         3    4             5    6   7    8
0 1.80791e-05   0   0.00126251  0   6.38504e-256    0   0   0   0   
1 2.89477e-199  1.01093e-06 0   1.1056e-55  0   5.52213e-67 0   0
2 2.47755e-05   0   2.43499e-08 0   1.00392e-239    0   0   0   0
3 2.54941e-161  3.79815e-06 0   1.53516e-40 0   1.63465e-41 0   0

Solution

  • As said in the comment, ff has [n_samples, n_classes]. Using numpy.argsort you will obtain, for each row, the predicted classes ordered by their probability in ascending order, obtaining again a matrix of shape [n_samples, n_classes]. You then take the last three elements of all rows ([:, -3:]) and reverse their order ([:, ::-1]) to obtain the class with best probability first:

    np.argsort(ff)[:, -3:][:, ::-1]
    

    Note the [:, in the slicing just means "get all the rows".