Search code examples
pythonmachine-learningscikit-learnconfusion-matrix

How to build a confusion matrix?


I have the following code, that plots nested vs non-nested cross-validations of a KNN algorithm.

# Number of random trials
NUM_TRIALS = 30

# Load the dataset

X_iris = X.values
y_iris = y

# Set up possible values of parameters to optimize over
p_grid = {"n_neighbors": [1, 5, 10]}

# We will use a Support Vector Classifier with "rbf" kernel
svm = KNeighborsClassifier()

# Arrays to store scores
non_nested_scores = np.zeros(NUM_TRIALS)
nested_scores = np.zeros(NUM_TRIALS)

# Loop for each trial
for i in range(NUM_TRIALS):

    # Choose cross-validation techniques for the inner and outer loops,
    # independently of the dataset.
    # E.g "GroupKFold", "LeaveOneOut", "LeaveOneGroupOut", etc.
    inner_cv = KFold(n_splits=4, shuffle=True, random_state=i)
    outer_cv = KFold(n_splits=4, shuffle=True, random_state=i)

    # Non_nested parameter search and scoring
    clf = GridSearchCV(estimator=svm, param_grid=p_grid, cv=inner_cv)
    clf.fit(X_iris, y_iris)
    non_nested_scores[i] = clf.best_score_

    # Nested CV with parameter optimization
    nested_score = cross_val_score(clf, X=X_iris, y=y_iris, cv=outer_cv)
    nested_scores[i] = nested_score.mean()

score_difference = non_nested_scores - nested_scores

preds=clf.best_estimator_.predict(X_test)
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, preds)
one, two, three, four,five,six,seven,eight,nine = confusion_matrix(y_test, preds).ravel()

The problem that I have is with the confusion matrix plotting, I encountered this following error :

ValueError                                Traceback (most recent call last)
<ipython-input-22-13536688e18b> in <module>()
     45 from sklearn.metrics import confusion_matrix
     46 cm = confusion_matrix(y_test, preds)
---> 47 one, two, three, four,five,six,seven,eight,nine = confusion_matrix(y_test, preds).ravel()
     48 cm = [[one,two],[three,four],[five,six],[seven,eight],[nine,eight]]
     49 ax= plt.subplot()

ValueError: too many values to unpack (expected 9)

I am not sure how to fix this. I have 9 target variables in my data set, stored in y.

[11 11 11 ... 33 33 33] #the target variables being : 11,12,13,21,22,23,31,32,33

And this is the head of my feature data set:

     Duration  Grand Mean  Max Mean Activation
0           64  136.772461           178.593750
1           67  193.445196           258.515625
2           67  112.382929           145.765625


Solution

  • The confusion matrix is built by "cm = confusion_matrix(y_test, preds)", where cm is 9x9 matrix (because you have 9 different labels in target variable). If you want to plot it you can use plot_confusion_matrix function. There is no need to ravel it. If you ravel it, the 9x9 matrix gets converted into 81 values and you are unpacking it to 9 variables on the left side of the assignment. That is the reason you are getting "too many values to unpack (expected 9)" error.