I have tried to create a confusion matrix on a knn-classifier in python, but the labeled classes are wrong.
The classes attribute of the dataset is 2 (for benign) and 4 (for malignant), but when I plot the confusion matrix, all labels are 2. The code I use is:
Data source: http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29
KNN classifier on Breast Cancer Wisconsin (Diagnostic) Data Set from UCI:
data = pd.read_csv('/breast-cancer-wisconsin.data')
data.replace('?', 0, inplace=True)
data.drop('id', 1, inplace = True)
X = np.array(data.drop(' class ', 1))
Y = np.array(data[' class '])
X_train, X_test, Y_train, Y_test = train_test_split(X,Y,test_size=0.2)
clf = neighbors.KNeighborsClassifier()
clf.fit(X_train, Y_train)
accuracy = clf.score(X_test, Y_test)
Plot confusion matrix
from sklearn.metrics import plot_confusion_matrix
disp = plot_confusion_matrix(clf, X_test, Y_test,
display_labels=Y,
cmap=plt.cm.Blues,)
The problem is that you're specifying the display_labels
argument with Y
, where it should just be the target names used for plotting. Now it's just using the two first values that appear in Y
, which happen to be 2, 2
. Note too that, as mentioned in the docs, the displayed labels will be the same as specified in labels
if it is provided, so you just need:
from sklearn.metrics import plot_confusion_matrix
fig, ax = plt.subplots(figsize=(8,8))
disp = plot_confusion_matrix(clf, X_test, Y_test,
labels=np.unique(y),
cmap=plt.cm.Blues,ax=ax)