Search code examples

How does LightGBM calculate the leaf values for the first tree in regression?

When plotting the first tree from a regression using create_tree_digraph, the leaf values make no sense to me. For example:

from sklearn.datasets import load_boston
X, y = load_boston(return_X_y=True)

import lightgbm as lgb
data = lgb.Dataset(X, label=y)

bst = lgb.train({}, data, num_boost_round=1)

Gives the following tree:


Focusing on leaf 3, for example, it seems like these are the fitted values:

bst.predict(X, num_iteration=0)[X[:,5]>7.437]
array([24.78919238, 24.78919238, 24.78919238, 24.78919238, 24.78919238,
       24.78919238, 24.78919238, 24.78919238, 24.78919238, 24.78919238,
       24.78919238, 24.78919238, 24.78919238, 24.78919238, 24.78919238,
       24.78919238, 24.78919238, 24.78919238, 24.78919238, 24.78919238,
       24.78919238, 24.78919238, 24.78919238, 24.78919238, 24.78919238,
       24.78919238, 24.78919238, 24.78919238, 24.78919238, 24.78919238])

But these seem like terrible predictions compared to the obvious and trivial method of taking the mean:

array([38.7, 43.8, 50. , 50. , 50. , 50. , 39.8, 50. , 50. , 42.3, 48.5,
       50. , 44.8, 50. , 37.6, 46.7, 41.7, 48.3, 42.8, 44. , 50. , 43.1,
       48.8, 50. , 43.5, 35.2, 45.4, 46. , 50. , 21.9])

What am I missing here?


  • LightGBM's leaf node output values show the prediction from that leaf node, which includes multiplying by the learning rate.

    The default learning rate is 0.1 ( If you change it to 1.0, you should see that the the output value for leaf 3 is 45.097 (exactly the mean of y for all observations that fall into that leaf node).

    from sklearn.datasets import load_boston
    X, y = load_boston(return_X_y=True)
    import lightgbm as lgb
    data = lgb.Dataset(X, label=y)
    bst = lgb.train({"learning_rate": 1.0}, data, num_boost_round=1)

    enter image description here

    Similarly, if you set the learning_rate to something very very very small, you should see that most of the leaf nodes from the first tree will have values very similar to the global mean of y. The global mean of y (y.mean()) in your example data is 22.532.

    bst = lgb.train({"learning_rate": 0.0000000000001}, data, num_boost_round=1)

    enter image description here

    I don't recommend setting learning_rate=1.0 in practice, as it can lead to worse accuracy. For gradient boosting libraries like LightGBM, it's preferred to use a learning rate < 1.0 and higher num_boost_round (try 100) , so that each individual tree only has a limited impact on the final prediction.

    If you do that, you'll find that each subsequent tree added to the model should add a small incremental improvement in accuracy. This is what happened in your original example. The global mean of y (y.mean()) in your example data is 22.532. For a group of records with local mean 45.097 and with learning rate set to 0.1, the first tree predicted 24.789. Not a great prediction by itself, but a better prediction for that group than the global mean.