I'm trying to build a regression model with R using lightGBM, and i'm getting a bit confused with some functions and when/how to use them.
First one is what i've written in the title, what's the difference between lgb.train() and lightgbm()?
The description in the documentation(https://cran.r-project.org/web/packages/lightgbm/lightgbm.pdf) says that lgb.train is 'Logic to train with LightGBM' and lightgbm is 'Simple interface for training a LightGBM model', while both their outcome value is lgb.Booster, a trained model. One difference I've found is that lgb.train() does not work with valids = , while lightgbm() does.
Second one is about a function lgb.cv(), regarding a cross validation in lightGBM. How do you apply the output of lgb.cv() to a model? As I understood from the documentation i've linked above, it seems like the output of both lgb.cv and lgb.train is a model. Is it correct to use it like the example below?
lgbcv <- lgb.cv(params,
lgbtrain,
nrounds = 1000,
nfold = 5,
early_stopping_rounds = 100,
learning_rate = 1.0)
lgbcv <- lightgbm(params,
lgbtrain,
nrounds = 1000,
early_stopping_rounds = 100,
learning_rate = 1.0)
Thank you in advance!
what's the difference between lgb.train() and lightgbm()?
These functions both train a LightGBM model, they're just slightly different interfaces. The biggest difference is in how training data are prepared. LightGBM training requires a special LightGBM-specific representation of the training data, called a Dataset
. To use lgb.train()
, you have to construct one of these beforehand with lgb.Dataset()
. lightgbm()
, on the other hand, can accept a data frame, data.table
, or matrix and will create the Dataset
object for you.
Choose whichever method you feel has a more friendly interface...both will produce a single trained LightGBM model (class "lgb.Booster"
).
that lgb.train() does not work with valids = , while lightgbm() does.
This is not correct. Both functions accept the keyword argument valids
. Run ?lgb.train
and ?lightgbm
for documentation on those methods.
How do you apply the output of lgb.cv() to a model?
I'm not sure what you mean, but you can find an example of how to use lgb.cv()
in the docs that show up when you run ?lgb.cv
.
library(lightgbm)
data(agaricus.train, package = "lightgbm")
train <- agaricus.train
dtrain <- lgb.Dataset(train$data, label = train$label)
params <- list(objective = "regression", metric = "l2")
model <- lgb.cv(
params = params
, data = dtrain
, nrounds = 5L
, nfold = 3L
, min_data = 1L
, learning_rate = 1.0
)
This returns an object of class "lgb.CVBooster"
. That object has multiple "lgb.Booster"
objects in it (the trained models that lightgbm()
or lgb.train()
produce).
You can extract any one of these from model$boosters
. However, in practice I don't recommend using the models from lgb.cv()
directly. The goal of cross-validation is to get an estimate of the generalization error for a model. So you can use lgb.cv()
to figure out the expected error for a given dataset + set of parameters (by looking at model$record_evals
and model$best_score
).