I am recreating the LightGBM binary log loss function using first and second-order derivatives calculated from https://www.derivative-calculator.net.
But my plots are different from the actual plots of the original definition as found in LightGBM
Why the difference in the plots? Am I calculating derivatives in the wrong way?
As we know,
loss = -y_true log(y_pred) - (1-y_true) log(1-y_pred)
where y_pred = sigmoid(logits)
Here is what calculator finds for,
-y log(1/(1+e^-x)) - (1-y) log(1-1/(1+e^-x))
and,
When I plot above using code,
def custom_odds_loss(y_true, y_pred):
y = y_true
# ======================
# Inverse sigmoid
# ======================
epsilon_ = 1e-7
y_pred = np.clip(y_pred, epsilon_, 1 - epsilon_)
y_pred = np.log(y_pred/(1-y_pred))
# ======================
grad = -((y-1)*np.exp(y_pred)+y)/(np.exp(y_pred)+1)
hess = np.exp(y_pred)/(np.exp(y_pred)+1)**2
return grad, hess
# Penalty chart for True 1s all the time
y_true_k = np.ones((1000, 1))
y_pred_k = np.expand_dims(np.linspace(0, 1, 1000), axis=1)
grad, hess = custom_odds_loss(y_true_k, y_pred_k)
data_ = {
'Payoff@grad': grad.flatten(),
}
pd.DataFrame(data_).plot(title='Target=1(G)|Penalty(y-axis) vs Probability/1000. (x-axis)');
data_ = {
'Payoff@hess': hess.flatten(),
}
pd.DataFrame(data_).plot(title='Target=1(H)|Penalty(y-axis) vs Probability/1000. (x-axis)');
Now, actual plot of LightGBM,
def custom_odds_loss(y_true, y_pred):
# ======================
# Inverse sigmoid
# ======================
epsilon_ = 1e-7
y_pred = np.clip(y_pred, epsilon_, 1 - epsilon_)
y_pred = np.log(y_pred/(1-y_pred))
# ======================
grad = y_pred - y_true
hess = y_pred * (1. - y_pred)
return grad, hess
# Penalty chart for True 1s all the time
y_true_k = np.ones((1000, 1))
y_pred_k = np.expand_dims(np.linspace(0, 1, 1000), axis=1)
grad, hess = custom_odds_loss(y_true_k, y_pred_k)
data_ = {
'Payoff@grad': grad.flatten(),
}
pd.DataFrame(data_).plot(title='Target=1(G)|Penalty(y-axis) vs Probability/1000. (x-axis)');
data_ = {
'Payoff@hess': hess.flatten(),
}
pd.DataFrame(data_).plot(title='Target=1(H)|Penalty(y-axis) vs Probability/1000. (x-axis)');
In the second function, you don't need to invert the sigmoid.
You see, the derivatives you found can be simplified as follows:
This simplification allows us not to invert anything and find the gradient and second derivative simply like this:
def custom_odds_loss(y_true, y_pred):
grad = y_pred - y_true
hess = y_pred * (1. - y_pred)
return grad, hess