R: Can I pass the weight parameter into the params = list() in LightGBM

Recently, I am learning the LightGBM package and want to tune the parameters of it.

I want to try all the parameters which can be tuned in the LightGBM.

One question is: when I build the model using the function: lightgbm(data, label = NULL, weight = NULL, params = list(), nrounds = 10, verbose = 1), can I put the weight and nrounds and many other parameters into a list object and feed to the params argument?

The following code is what I used:

# input data for lgb.Dataset() 
data_lgb <- lgb.Dataset(
  data = X_tr,
  label = y_tr
)

# can I put all parameters to be tuned into this list?
params_list <- list(weight = NULL, nrounds = 20, verbose = 1, learning_rate = 0.1)

# build lightgbm model using only: data_lgb and params_list
lgb_model <- lightgbm(data_lgb, params = params_list)

Can I do this using the above code?

I ask because I have a large training data set (2 million rows and 700 features). If I put the lgb.Dataset() into the lightgbm such as lightgbm(data = lgb.Dataset(data = X_tr,label = y_tr), params = params_list), then It takes time for multiple model building. Therefore, I first get the dataset used for lightgbm and for each model, the dataset is constant, what I did can only focus on the different parameters.

However, I am not sure, in total, how many parameters can be put into the params_list? Such as can the weight parameter be in the params_list? When I look the help ?lightgbm, I notice that the weight parameter and many other parameters are out side of the params_list.

Can you help me figure out: in total which parameters can be put into the params_list? That is the final model is built only using the data argument and params argument (other parameters are put into the params list object) as shown above, is that feasible?

Thank you.

Solution

Lightgbm has many params which you can tune. Please read the documentation.

I am pasting some part from one of my model script which shows the process. Should be a good hint for you.

nthread <- as.integer(future::availableCores())
seed <- 1000
EARLY_STOPPING <- 50
nrounds <- 1000
param <- list(objective = "regression"
                    metric = "rmse",
                    max_depth = 3,
                    num_leaves = 5,
                    learning_rate = 0.1,
                    nthread = nthread,
                    bagging_fraction = 0.7,
                    feature_fraction = 0.7,
                    bagging_freq = 5,
                    bagging_seed = seed,
                    verbosity = -1,
                    min_data_in_leaf = 5)


dtrain <- lgb.Dataset(data = as.matrix(train_X),
                    label = train_y)

dval <- lgb.Dataset(data = as.matrix(val_X),
                    label = val_y)
valids <- list(val = dval)
bst <- lgb.train(param,
                 data = dtrain,
                 nrounds = nrounds,
                 data_random_seed = seed,
                 early_stopping_rounds = EARLY_STOPPING,
                 valids = valids)