Search code examples
rxgboost

Why does training Xgboost model with pseudo-Huber loss return a constant test metric?


I am trying to fit an xgboost model using the native pseudo-Huber loss reg:pseudohubererror. However, it doesn't seem to be working since nor the training nor the test error is improving. It works just fine with reg:squarederror. What am I missing?

Code:

library(xgboost)
n = 1000
X = cbind(runif(n,10,20), runif(n,0,10))
y = X %*% c(2,3) + rnorm(n,0,1)

train = xgb.DMatrix(data  = X[-n,],
                    label = y[-n])

test = xgb.DMatrix(data   = t(as.matrix(X[n,])),
                   label = y[n]) 

watchlist = list(train = train, test = test)

xbg_test = xgb.train(data = train, objective = "reg:pseudohubererror", eval_metric = "mae", watchlist = watchlist, gamma = 1, eta = 0.01, nrounds = 10000, early_stopping_rounds = 100)

Result:

[1] train-mae:44.372692 test-mae:33.085709 
Multiple eval metrics are present. Will use test_mae for early stopping.
Will train until test_mae hasn't improved in 100 rounds.

[2] train-mae:44.372692 test-mae:33.085709 
[3] train-mae:44.372688 test-mae:33.085709 
[4] train-mae:44.372688 test-mae:33.085709 
[5] train-mae:44.372688 test-mae:33.085709 
[6] train-mae:44.372688 test-mae:33.085709 
[7] train-mae:44.372688 test-mae:33.085709 
[8] train-mae:44.372688 test-mae:33.085709 
[9] train-mae:44.372688 test-mae:33.085709 
[10]    train-mae:44.372692 test-mae:33.085709 

Solution

  • It seems like that is the expected behavior of the pseudohuber loss. Here I hard coded the first and second derivatives of the objective loss function found here and fed it via the obj=obje parameter. If you run it and compare with the objective="reg:pseudohubererror" version, you'll see they are the same. As for why it is so much worse than squared loss, not sure.

    set.seed(20)
    
    obje=function(pred, dData) {
      labels=getinfo(dData, "label")
      a=pred
      d=labels
      fir=a^2/sqrt(a^2/d^2+1)/d-2*d*(sqrt(a^2/d^2+1)-1)
      sec=((2*(a^2/d^2+1)^(3/2)-2)*d^2-3*a^2)/((a^2/d^2+1)^(3/2)*d^2)
      return (list(grad=fir, hess=sec))
    }
    
    xbg_test = xgb.train(data = train, obj=obje, eval_metric = "mae", watchlist = watchlist, gamma = 1, eta = 0.01, nrounds = 10000, early_stopping_rounds = 100)