Recently, I am learning the LightGBM package and want to tune the parameters of it.
I want to try all the parameters which can be tuned in the LightGBM.
One question is: when I build the model using the function: lightgbm(data, label = NULL, weight = NULL, params = list(), nrounds = 10, verbose = 1)
, can I put the weight
and nrounds
and many other parameters
into a list object and feed to the params
argument?
The following code is what I used:
# input data for lgb.Dataset()
data_lgb <- lgb.Dataset(
data = X_tr,
label = y_tr
)
# can I put all parameters to be tuned into this list?
params_list <- list(weight = NULL, nrounds = 20, verbose = 1, learning_rate = 0.1)
# build lightgbm model using only: data_lgb and params_list
lgb_model <- lightgbm(data_lgb, params = params_list)
Can I do this using the above code?
I ask because I have a large training data set (2 million rows and 700 features). If I put the lgb.Dataset() into the lightgbm such as lightgbm(data = lgb.Dataset(data = X_tr,label = y_tr), params = params_list)
, then It takes time for multiple model building. Therefore, I first get the dataset used for lightgbm and for each model, the dataset is constant, what I did can only focus on the different parameters.
However, I am not sure, in total, how many parameters can be put into the params_list
? Such as can the weight
parameter be in the params_list
? When I look the help ?lightgbm
, I notice that the weight
parameter and many other parameters
are out side of the params_list
.
Can you help me figure out: in total which parameters can be put into the params_list
? That is the final model is built only using the data
argument and params
argument (other parameters are put into the params list object) as shown above, is that feasible?
Thank you.
Lightgbm
has many params which you can tune. Please read the documentation.
I am pasting some part from one of my model script which shows the process. Should be a good hint for you.
nthread <- as.integer(future::availableCores())
seed <- 1000
EARLY_STOPPING <- 50
nrounds <- 1000
param <- list(objective = "regression"
metric = "rmse",
max_depth = 3,
num_leaves = 5,
learning_rate = 0.1,
nthread = nthread,
bagging_fraction = 0.7,
feature_fraction = 0.7,
bagging_freq = 5,
bagging_seed = seed,
verbosity = -1,
min_data_in_leaf = 5)
dtrain <- lgb.Dataset(data = as.matrix(train_X),
label = train_y)
dval <- lgb.Dataset(data = as.matrix(val_X),
label = val_y)
valids <- list(val = dval)
bst <- lgb.train(param,
data = dtrain,
nrounds = nrounds,
data_random_seed = seed,
early_stopping_rounds = EARLY_STOPPING,
valids = valids)