I am trying to create a decision tree for a dataset and study the resulting confusion matrix. While the confusion matrix tells me how many misclassifications have occurred It does not exactly tell me which particular instances in X_train
have been misclassified. I am trying to find out which are these misclassified instances and in which leaf node did they end up in. I know I can use decision_path()
but it doesn't tell me if that particular instance was misclassified or not. My main goal here is to identify where the confused and incorrectly classified instances are ending up. following is my code:
from sklearn.datasets import load_iris
iris=load_iris()
Y_train=iris.target
X_train=iris.data
clf=tree.DecisionTreeClassifier( max_depth=3, criterion='entropy')
clf.fit(X_train, Y_train)
pred=clf.predict(X_train)
print('Accuracy on test data is %.2f' % (accuracy_score(Y_train, pred)))
You got all predictions in pred
and all training values in Y_train
Your misclassified predictions are then simply pred[pred!=Y_train]
If you want the features X_train[pred!=Y_train]