I was trying to create a decision tree model using scikit-learn's module: tree
. Once I generated the model, I visualized the tree and the criteria based on which the decisions were made. However, I wish to modify the thresholds in some criteria manually to see how the output changes for the same. Is there any method to do so? Or any library that converts the decision tree into a bunch of if-else statements once it has learned the required thresholds from the dataset and vice-versa?
I know that the thresholds chosen by the module are based on some impurity metrics like Gini-impurity, information gain, etc. However, I still would like to experiment with those threshold values.
Thanks!
Yes, you can easily do this.
A sklearn
Decision Tree exposes its underlying tree through the tree_
attribute. This tree_
, among other things, have an attribute threshold
, which is a numpy array containing threshold values of all nodes. You can modify this array, thereby changing the thresholds.
For example:
X,y = load_breast_cancer(return_X_y=True)
dt = DecisionTreeClassifier(max_depth=3).fit(X,y)
print(dt.tree_.threshold) #All the thresholds, size equals "dt.tree_.node_count"
dt.tree_.threshold[3] = 10.0 #Manually modifying a threshold
To verify, If you compare accuracy on a seperate test set before and after this modification (assuming you've modified a non-leaf node), you should notice a change (which is likely to be worse).