I am currently trying to find the optimal parameters of a XGBoost model. After finding the optimal parameters I would like to evaluate the model with cross validation by using multiple customized evalation metrics.
Let's assume I would like to use the following two metrics: (I would like to use different metrics, but the first one was provided in the documentation and I just want to get to know how I can use tow metrics)
def rmsle(predt: np.ndarray, dtrain: xgb.DMatrix) -> Tuple[str, float]:
''' Root mean squared log error metric.'''
y = dtrain.get_label()
predt[predt < -1] = -1 + 1e-6
elements = np.power(np.log1p(y) - np.log1p(predt), 2)
return 'PyRMSLE', float(np.sqrt(np.sum(elements) / len(y)))
def rmsle2(predt: np.ndarray, dtrain: xgb.DMatrix) -> Tuple[str, float]:
''' Root mean squared log error metric.'''
y = dtrain.get_label()
predt[predt < -1] = -1 + 1e-6
elements = np.power(np.log1p(y) - np.log1p(predt), 2)
return 'PyRMSLE', float(2*np.sqrt(np.sum(elements) / len(y)))
Now I use the follwing line to compute the models:
cvresult = xgb.cv(xgb_param, xgtrain, num_boost_round=alg.get_params()['n_estimators'], folds=cv,
feval={rmsle,rmsle2}, early_stopping_rounds=early_stopping_rounds)
Unfortunately this is not working.
If I only use on feval metric feval=rmsle
, this works out.
I can use two 'standard metrics' like the RMSE or MAE:
cvresult = xgb.cv(xgb_param, xgtrain, num_boost_round=alg.get_params()['n_estimators'], folds=cv,
metrics={'mae','rmse'}, early_stopping_rounds=early_stopping_rounds)
Here no error appears, but when I want to use more custom metrics, I get an error.
It would be amazing if anyone could provide me some help here. Thank you very much.
In the end I did it just with:
cross_validate(xgb1, X, y, scoring=scorer, cv=KFold(n_splits=cv_folds, random_state=seed, shuffle=True), verbose = 0)
and
scorer = {'MAE': make_scorer(MAE, greater_is_better=False),
'MAPE': make_scorer(MAPE, greater_is_better=False),
'MdAE': make_scorer(MdAE, greater_is_better=False),
'MdAPE': make_scorer(MdAPE, greater_is_better=False),
'In_10': make_scorer(In_10, greater_is_better=True),
'In_20': make_scorer(In_20, greater_is_better=True)}