I am trying to predict a class using GaussianNB, but I need to get top 3 predicted classes to create a custom score for the prediction.
My training data is x,y,class where given x and y it needs to predict the class
tests variable cointains (x,y) values and testclass contains class values.
Test is a list data set in following format
Index Type Size Value
0 tuple 2 (0.6424, 0.8325)
1 tuple 2 (0.8493, 0.7848)
2 tuple 2 (0.791, 0.4191)
Test class data
Index Type Size Value
0 str 1 1.274e+09
1 str 1 9.5047e+09
Code:
import csv
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.naive_bayes import GaussianNB
clf_pf = GaussianNB()
clf_pf.fit(train, trainclass)
print clf_pf.score(test,testclass)
ff = clf_pf.predict_proba(test)
How to get the top 3 predicted classes from above variable ff?
My ff data is like below
0 1 2 3 4 5 6 7 8
0 1.80791e-05 0 0.00126251 0 6.38504e-256 0 0 0 0
1 2.89477e-199 1.01093e-06 0 1.1056e-55 0 5.52213e-67 0 0
2 2.47755e-05 0 2.43499e-08 0 1.00392e-239 0 0 0 0
3 2.54941e-161 3.79815e-06 0 1.53516e-40 0 1.63465e-41 0 0
As said in the comment, ff
has [n_samples, n_classes]
. Using numpy.argsort you will obtain, for each row, the predicted classes ordered by their probability in ascending order, obtaining again a matrix of shape [n_samples, n_classes]
. You then take the last three elements of all rows ([:, -3:]
) and reverse their order ([:, ::-1]
) to obtain the class with best probability first:
np.argsort(ff)[:, -3:][:, ::-1]
Note the [:,
in the slicing just means "get all the rows".