Search code examples
numpymachine-learningentropyderivativexgboost

How is the gradient and hessian of logarithmic loss computed in the custom objective function example script in xgboost's github repository?


I would like to understand how the gradient and hessian of the logloss function are computed in an xgboost sample script.

I've simplified the function to take numpy arrays, and generated y_hat and y_true which are a sample of the values used in the script.

Here is a simplified example:

import numpy as np


def loglikelihoodloss(y_hat, y_true):
    prob = 1.0 / (1.0 + np.exp(-y_hat))
    grad = prob - y_true
    hess = prob * (1.0 - prob)
    return grad, hess

y_hat = np.array([1.80087972, -1.82414818, -1.82414818,  1.80087972, -2.08465433,
                  -1.82414818, -1.82414818,  1.80087972, -1.82414818, -1.82414818])
y_true = np.array([1.,  0.,  0.,  1.,  0.,  0.,  0.,  1.,  0.,  0.])

loglikelihoodloss(y_hat, y_true)

The log loss function is the sum of enter image description here where enter image description here.

The gradient (with respect to p) is then enter image description here however in the code its enter image description here.

Likewise the second derivative (with respect to p) is enter image description here however in the code it is enter image description here.

How are the equations equal?


Solution

  • The log loss function is given as:

    enter image description here

    where

    enter image description here

    Taking the partial derivative we get the gradient as

    enter image description here

    Thus we get the negative of gradient as p-y.

    Similar calculations can be done to obtain the hessian.