python machine-learning scikit-learn decision-tree

In a scikit-learn decision tree, how can you identify the decisions that lead to misclassification?

I am trying to create a decision tree for a dataset and study the resulting confusion matrix. While the confusion matrix tells me how many misclassifications have occurred It does not exactly tell me which particular instances in X_train have been misclassified. I am trying to find out which are these misclassified instances and in which leaf node did they end up in. I know I can use decision_path() but it doesn't tell me if that particular instance was misclassified or not. My main goal here is to identify where the confused and incorrectly classified instances are ending up. following is my code:

from sklearn.datasets import load_iris
iris=load_iris()

Y_train=iris.target
X_train=iris.data

clf=tree.DecisionTreeClassifier( max_depth=3, criterion='entropy')
clf.fit(X_train, Y_train)
pred=clf.predict(X_train)
print('Accuracy on test data is %.2f' % (accuracy_score(Y_train, pred)))

Solution

You got all predictions in pred and all training values in Y_train

Your misclassified predictions are then simply pred[pred!=Y_train]

If you want the features X_train[pred!=Y_train]