I have the following code, that plots nested vs non-nested cross-validations of a KNN algorithm.
# Number of random trials
NUM_TRIALS = 30
# Load the dataset
X_iris = X.values
y_iris = y
# Set up possible values of parameters to optimize over
p_grid = {"n_neighbors": [1, 5, 10]}
# We will use a Support Vector Classifier with "rbf" kernel
svm = KNeighborsClassifier()
# Arrays to store scores
non_nested_scores = np.zeros(NUM_TRIALS)
nested_scores = np.zeros(NUM_TRIALS)
# Loop for each trial
for i in range(NUM_TRIALS):
# Choose cross-validation techniques for the inner and outer loops,
# independently of the dataset.
# E.g "GroupKFold", "LeaveOneOut", "LeaveOneGroupOut", etc.
inner_cv = KFold(n_splits=4, shuffle=True, random_state=i)
outer_cv = KFold(n_splits=4, shuffle=True, random_state=i)
# Non_nested parameter search and scoring
clf = GridSearchCV(estimator=svm, param_grid=p_grid, cv=inner_cv)
clf.fit(X_iris, y_iris)
non_nested_scores[i] = clf.best_score_
# Nested CV with parameter optimization
nested_score = cross_val_score(clf, X=X_iris, y=y_iris, cv=outer_cv)
nested_scores[i] = nested_score.mean()
score_difference = non_nested_scores - nested_scores
preds=clf.best_estimator_.predict(X_test)
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, preds)
one, two, three, four,five,six,seven,eight,nine = confusion_matrix(y_test, preds).ravel()
The problem that I have is with the confusion matrix plotting, I encountered this following error :
ValueError Traceback (most recent call last)
<ipython-input-22-13536688e18b> in <module>()
45 from sklearn.metrics import confusion_matrix
46 cm = confusion_matrix(y_test, preds)
---> 47 one, two, three, four,five,six,seven,eight,nine = confusion_matrix(y_test, preds).ravel()
48 cm = [[one,two],[three,four],[five,six],[seven,eight],[nine,eight]]
49 ax= plt.subplot()
ValueError: too many values to unpack (expected 9)
I am not sure how to fix this. I have 9 target variables in my data set, stored in y.
[11 11 11 ... 33 33 33] #the target variables being : 11,12,13,21,22,23,31,32,33
And this is the head of my feature data set:
Duration Grand Mean Max Mean Activation
0 64 136.772461 178.593750
1 67 193.445196 258.515625
2 67 112.382929 145.765625
The confusion matrix is built by "cm = confusion_matrix(y_test, preds)", where cm is 9x9 matrix (because you have 9 different labels in target variable). If you want to plot it you can use plot_confusion_matrix function. There is no need to ravel it. If you ravel it, the 9x9 matrix gets converted into 81 values and you are unpacking it to 9 variables on the left side of the assignment. That is the reason you are getting "too many values to unpack (expected 9)" error.