i want to do feature selection on my data set by CART and C4.5 decision tree. In such a way that apply decision tree on data set and then extract the features that decision tree algorithm use to create the tree. so i need return the features that use in the created tree. i use "DecisionTreeClassifier" in sklearn.tree module. i need a method or function to give me (return) the features that used in created tree!! to use this features as more important features in main modulation algorithm.
You can approach the problem similar to the below:
I assume you have the train (x_train, y_train) and test (x_test, y_test) sets.
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import f1_score
tree_clf1 = DecisionTreeClassifier().fit(x_train, y_train)
y_pred = tree_clf1.predict(x_test)
print(confusion_matrix(y_test, y_pred))
print("\n\nAccuracy:{:,.2f}%".format(accuracy_score(y_test, y_pred)*100))
print("Precision:{:,.2f}%".format(precision_score(y_test, y_pred)*100))
print("Recall:{:,.2f}%".format(recall_score(y_test, y_pred)*100))
print("F1-Score:{:,.2f}%".format(f1_score(y_test, y_pred)*100))
feature_importances = DataFrame(tree_clf1.feature_importances_,
index = x_train.columns,
columns['importance']).sort_values('importance',
ascending=False)
print(feature_importances)
Below is an example output, shows which features are important for your classification.