Search code examples
python-3.xmachine-learningscikit-learndata-sciencedecision-tree

DecisionTree Classifier in scikit learn features return the value -2, what does it mean


The tree._feature for DecisionTreeClassifier and DecisionTreeRegressor returns the value -2 a few times in the end. Is this because they are leaf nodes? Can I assume any -2 value as leaf node features?


Solution

  • Generally, yes. The object variable .tree_.feature returns -2 when there is no split on that node, which occurs if and only if the node is a leaf (when the tree was grown without pruning, ie. fit(..., ccp_alpha=0)).

    While it's not 100% clear in the help() documentation, a reference to this can be found inside the code here: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/_tree.pyx

    where the feature value is set to the static variable TREE_UNDEFINED = -2.