Search code examples
graphvizdecision-treexgboost

What is the meaning of the value of the boosted tree?


I plotted a tree and in the end of the trees (in the leaves) there are shown some values. What do they mean?

# model parameters
colsample_bytree = 0.4
objective = 'binary:logistic'
learning_rate = 0.05
eval_metric = 'auc'
max_depth = 8
min_child_weight = 4
n_estimators = 5000
seed = 7

# create and train model
bst = xgb.train(param, 
                dtrain, 
                num_boost_round = best_iteration)

dot = xgb.to_graphviz(bst, rankdir='LR')
dot.render("trees1")

I thought, it is a predicted proba score, but the leaves' values' range is up to .01. Whereas predicted proba score' range is up to 1. May be, it means predicted proba' score divided by 10 (e.g. leaf value = 0.01 means that predicted proba = 0.1)?

And why do some leaves have negative values (e.g. -0.01)? Thank you.

Part of the tree


Solution

  • The value of a leaf is your "eval_metric", local to your split :). For you it is the AUC.

    Here are all attributes of a tree :

    n_nodes = estimator.tree_.node_count
    children_left = estimator.tree_.children_left
    children_right = estimator.tree_.children_right
    feature = estimator.tree_.feature
    threshold = estimator.tree_.threshold
    

    From doc : https://scikit-learn.org/stable/auto_examples/tree/plot_unveil_tree_structure.html#sphx-glr-auto-examples-tree-plot-unveil-tree-structure-py

    Can't find it in the doc but "tree_.impurity" does exist aswell.