I am using xgboost
and I need to reproduce its output using if statement and additions. However, I'm not getting the right output.
Let's create random data:
import numpy as np
import xgboost as xgb
import os
np.random.seed(42)
data = np.random.rand(100, 5)
label = np.random.randint(2, size=100)
dtrain = xgb.DMatrix(data, label=label)
Then create a basic model:
param = {'max_depth': 2, 'eta': 1, 'objective': 'binary:logistic'}
num_round = 3
bst = xgb.train(param, dtrain, num_round)
Now I save the booster trees' rules:
savefile = 'dump.raw.txt'
bst.dump_model(savefile)
os.startfile(savefile)
This is what I get:
# booster[0]:
# 0:[f3<0.905868173] yes=1,no=2,missing=1
# 1:[f0<0.0309647173] yes=3,no=4,missing=3
# 3:leaf=0.5
# 4:leaf=-0.561797738
# 2:[f3<0.956529975] yes=5,no=6,missing=5
# 5:leaf=0.909090936
# 6:leaf=-0.5
# booster[1]:
# 0:[f2<0.863453388] yes=1,no=2,missing=1
# 1:[f2<0.71782589] yes=3,no=4,missing=3
# 3:leaf=-0.0658661202
# 4:leaf=1.03587329
# 2:[f0<0.345137954] yes=5,no=6,missing=5
# 5:leaf=0.0854885057
# 6:leaf=-1.15627134
# booster[2]:
# 0:[f2<0.46345675] yes=1,no=2,missing=1
# 1:[f2<0.18197903] yes=3,no=4,missing=3
# 3:leaf=-0.321362823
# 4:leaf=1.05848205
# 2:[f3<0.704104543] yes=5,no=6,missing=5
# 5:leaf=-0.623027325
# 6:leaf=0.46367079
My test set is this:
bst.predict(dtrain)[0]
array([0.37454012, 0.95071431, 0.73199394, 0.59865848, 0.15601864])
If I sum up the splits, I get this:
-0.5618 + 1.0358 - 0.6230 = -0.14899
It should be 0.48283
How do I get the right output value?
How do I get the right output value?
You appear to be dealing with a binary classification problem (integer labels of 0
and 1
), therefore you need to apply the sigmoid function to the boosted score.
Re-doing your computation with more precision:
import numpy
x = -0.561797738 + 1.03587329 + -0.623027325
1. / (1. + numpy.exp(-x))
.. yields 0.46283075