Search code examples
rmachine-learninglightgbm

Error in data$update_params(params = params) : [LightGBM] [Fatal] Cannot change max_bin after constructed Dataset handle


I downloaded lightgbm package on RStudio and trying to run a model with it. The script based on the Retip.

The function is this :

> fit.lightgbm
function (training, testing) 
{
  train <- as.matrix(training)
  test <- as.matrix(testing)
  coltrain <- ncol(train)
  coltest <- ncol(test)
  dtrain <- lightgbm::lgb.Dataset(train[, 2:coltrain], label = train[, 
                                                                     1])
  lightgbm::lgb.Dataset.construct(dtrain)
  dtest <- lightgbm::lgb.Dataset.create.valid(dtrain, test[,2:coltest], label = test[, 1])
  valids <- list(test = dtest)
  params <- list(objective = "regression", metric = "rmse")
  modelcv <- lightgbm::lgb.cv(params, dtrain, nrounds = 5000, 
                              nfold = 10, valids, verbose = 1, early_stopping_rounds = 1000, 
                              record = TRUE, eval_freq = 1L, stratified = TRUE, max_depth = 4, 
                              max_leaf = 20, max_bin = 50)
  best.iter <- modelcv$best_iter
  params <- list(objective = "regression_l2", metric = "rmse")
  model <- lightgbm::lgb.train(params, dtrain, nrounds = best.iter, 
                               valids, verbose = 0, early_stopping_rounds = 1000, record = TRUE, 
                               eval_freq = 1L, max_depth = 4, max_leaf = 20, max_bin = 50)
  print(paste0("End training"))
  return(model)
}

However when I'm trying to run the function as in the Retip

lightgbm <- fit.lightgbm(training,testing)

There is this Fatal Error:

Error in data$update_params(params = params) : 
  [LightGBM] [Fatal] Cannot change max_bin after constructed Dataset handle. 

Only when changing max_bin to max_bin=255 there is no error.

Went through documentation:

What is the right way for hyper parameter tuning for LightGBM classification? #1339

[Python] max_bin weird behaviour #1053

Any ideas\suggestions to what should be done?


Solution

  • This was cross-posted to https://github.com/microsoft/LightGBM/issues/4019 and has been answered there.

    Construction of the Dataset object in LightGBM handles some important pre-processing steps (see this prior answer) that happen before training, and none of the Dataset parameters can be changed after construction.

    Passing max_bin=50 into lgb.Dataset() instead of lgb.cv() / lgb.train() in the original post's code will result in successful training without this error.