Search code examples
pythonnumpyxgboost

How to reproduce the XGBOOST splits with if statements?


I am using xgboost and I need to reproduce its output using if statement and additions. However, I'm not getting the right output.

Let's create random data:

import numpy as np
import xgboost as xgb
import os

np.random.seed(42)
data = np.random.rand(100, 5)
label = np.random.randint(2, size=100)
dtrain = xgb.DMatrix(data, label=label)

Then create a basic model:

param = {'max_depth': 2, 'eta': 1, 'objective': 'binary:logistic'}

num_round = 3
bst = xgb.train(param, dtrain, num_round)

Now I save the booster trees' rules:

savefile = 'dump.raw.txt'
bst.dump_model(savefile)
os.startfile(savefile)

This is what I get:

# booster[0]:
# 0:[f3<0.905868173] yes=1,no=2,missing=1
#   1:[f0<0.0309647173] yes=3,no=4,missing=3
#       3:leaf=0.5
#       4:leaf=-0.561797738
#   2:[f3<0.956529975] yes=5,no=6,missing=5
#       5:leaf=0.909090936
#       6:leaf=-0.5
# booster[1]:
# 0:[f2<0.863453388] yes=1,no=2,missing=1
#   1:[f2<0.71782589] yes=3,no=4,missing=3
#       3:leaf=-0.0658661202
#       4:leaf=1.03587329
#   2:[f0<0.345137954] yes=5,no=6,missing=5
#       5:leaf=0.0854885057
#       6:leaf=-1.15627134
# booster[2]:
# 0:[f2<0.46345675] yes=1,no=2,missing=1
#   1:[f2<0.18197903] yes=3,no=4,missing=3
#       3:leaf=-0.321362823
#       4:leaf=1.05848205
#   2:[f3<0.704104543] yes=5,no=6,missing=5
#       5:leaf=-0.623027325
#       6:leaf=0.46367079

My test set is this:

bst.predict(dtrain)[0]
array([0.37454012, 0.95071431, 0.73199394, 0.59865848, 0.15601864])

If I sum up the splits, I get this:

-0.5618 + 1.0358 - 0.6230 = -0.14899

It should be 0.48283

How do I get the right output value?


Solution

  • How do I get the right output value?

    You appear to be dealing with a binary classification problem (integer labels of 0 and 1), therefore you need to apply the sigmoid function to the boosted score.

    Re-doing your computation with more precision:

    import numpy
    x = -0.561797738 + 1.03587329 + -0.623027325
    1. / (1. + numpy.exp(-x))
    

    .. yields 0.46283075