Search code examples
treescikit-learnboosting

What are the leaf values in sklearn GBDT, and How do I obtain them?


I can export structure of a GBDT to a image with the tree.export_graphviz function:

``` Python3

from sklearn.datasets import load_iris
from sklearn import tree
from sklearn.ensemble import GradientBoostingClassifier

clf = GradientBoostingClassifier(n_estimators=1) # set to 1 for the sake of simplicity
iris = load_iris()

clf = clf.fit(iris.data, iris.target)
tree.export_graphviz(clf.estimators_[0,0], out_file='tree.dot')
check_call(['dot','-Tpng','tree.dot','-o','tree.png'])

```

This is the obtained image.

I wondering what are the value on the leafs? and How can I obtain them?

I have tried the apply and decision_function functions, neither works.


Solution

  • You can access leave properties of each individual tree using its internal object tree_ and its attributes. export_graphviz uses exactly this approach.

    Consider this code. For each attribute, it gives an array of its values over all the tree nodes:

    print(clf.estimators_[0,0].tree_.feature)
    print(clf.estimators_[0,0].tree_.threshold)
    print(clf.estimators_[0,0].tree_.children_left)
    print(clf.estimators_[0,0].tree_.children_right)
    print(clf.estimators_[0,0].tree_.n_node_samples)
    print(clf.estimators_[0,0].tree_.value.ravel())
    

    The output will be

    [ 2 -2 -2]
    [ 2.45000005 -2.         -2.        ]
    [ 1 -1 -1]
    [ 2 -1 -1]
    [150  50 100]
    [  3.51570624e-16   2.00000000e+00  -1.00000000e+00]
    

    That is, your tree has 3 nodes, and the first one compares the value of feature 2 with 2.45, etc.

    The values in the root node, left and right leaf are 3e-16, 2, and -1 respectively.

    These values, although, are not obvious to interpret, because the tree has tried to predict the gradient of GBDT loss function.