Search code examples
machine-learningloss-functiongradient-descentlightgbm

Binary log loss in LGBM not as per derivative calculations found online


I am recreating the LightGBM binary log loss function using first and second-order derivatives calculated from https://www.derivative-calculator.net.

But my plots are different from the actual plots of the original definition as found in LightGBM

Why the difference in the plots? Am I calculating derivatives in the wrong way?

As we know, loss = -y_true log(y_pred) - (1-y_true) log(1-y_pred) where y_pred = sigmoid(logits)

Here is what calculator finds for, -y log(1/(1+e^-x)) - (1-y) log(1-1/(1+e^-x))

d/dx (- y log(1/(1+e^-x)) - (1-y) log(1-1/(1+e^-x))) = = enter image description here

and,

enter image description here = enter image description here

When I plot above using code,

def custom_odds_loss(y_true, y_pred):
    y = y_true
    # ======================
    # Inverse sigmoid
    # ======================
    epsilon_ = 1e-7
    y_pred = np.clip(y_pred, epsilon_, 1 - epsilon_)
    y_pred = np.log(y_pred/(1-y_pred))
    # ======================
    
    grad = -((y-1)*np.exp(y_pred)+y)/(np.exp(y_pred)+1)
    hess = np.exp(y_pred)/(np.exp(y_pred)+1)**2
    
    return grad, hess

# Penalty chart for True 1s all the time
y_true_k = np.ones((1000, 1))
y_pred_k = np.expand_dims(np.linspace(0, 1, 1000), axis=1)
grad, hess = custom_odds_loss(y_true_k, y_pred_k)
data_ = {
    'Payoff@grad': grad.flatten(),
}
pd.DataFrame(data_).plot(title='Target=1(G)|Penalty(y-axis) vs Probability/1000. (x-axis)');
data_ = {
    'Payoff@hess': hess.flatten(),
}
pd.DataFrame(data_).plot(title='Target=1(H)|Penalty(y-axis) vs Probability/1000. (x-axis)');

enter image description here

Now, actual plot of LightGBM,

def custom_odds_loss(y_true, y_pred):
    # ======================
    # Inverse sigmoid
    # ======================
    epsilon_ = 1e-7
    y_pred = np.clip(y_pred, epsilon_, 1 - epsilon_)
    y_pred = np.log(y_pred/(1-y_pred))
    # ======================

    grad = y_pred - y_true
    hess = y_pred * (1. - y_pred)
    return grad, hess

# Penalty chart for True 1s all the time
y_true_k = np.ones((1000, 1))
y_pred_k = np.expand_dims(np.linspace(0, 1, 1000), axis=1)

grad, hess = custom_odds_loss(y_true_k, y_pred_k)

data_ = {
    'Payoff@grad': grad.flatten(),
}
pd.DataFrame(data_).plot(title='Target=1(G)|Penalty(y-axis) vs Probability/1000. (x-axis)');
data_ = {
    'Payoff@hess': hess.flatten(),
}
pd.DataFrame(data_).plot(title='Target=1(H)|Penalty(y-axis) vs Probability/1000. (x-axis)');

enter image description here


Solution

  • In the second function, you don't need to invert the sigmoid.

    You see, the derivatives you found can be simplified as follows:

    grad

    hessian

    This simplification allows us not to invert anything and find the gradient and second derivative simply like this:

    def custom_odds_loss(y_true, y_pred):
    
        grad = y_pred - y_true
        hess = y_pred * (1. - y_pred)
        return grad, hess