everyone.
first, data sample is here:
> str(train)
'data.frame': 30226 obs. of 71 variables:
$ sal : int 2732 2732 2732 2328 2560 3584 5632 5632 3584 2150 ...
$ avg : num 2392 2474 2392 2561 2763 ...
$ med : num 2314 2346 2314 2535 2754 ...
$ jt_category_1 : int 1 1 1 1 1 1 1 1 1 1 ...
$ jt_category_2 : int 0 0 0 0 0 0 0 0 0 0 ...
$ job_num_1 : int 0 0 0 0 0 0 0 0 0 0 ...
$ job_num_2 : int 0 0 0 0 0 0 0 0 0 0 ...
and more 64 variables(type of all is int, 0 or 1 binary values)
column "sal" is label and it's Test data (70% of raw data)
I use package "caret" in R for regression, and choice method "xgbTree". I know it works for classification and regression.
The issue is, i wanna regression... but i don't know how to do
i execute the full code, the error is
Error: Metric RMSE not applicable for classification models
but i'm not trying to do classification. i wanna do regression.
type of my label(y of train function) is int
and data type also checked.
is that wrong? it makes caret recognize this training as classification?
> str(train$sal)
int [1:30226] 2732 2732 2732 2328 2560 3584 5632 5632 3584 2150 ...
> str(train_xg)
Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
..@ i : int [1:181356] 0 1 2 3 4 5 6 7 8 9 ...
..@ p : int [1:71] 0 30226 60452 90504 90678 90709 90962 93875 95087 96190 ...
..@ Dim : int [1:2] 30226 70
..@ Dimnames:List of 2
.. ..$ : NULL
.. ..$ : chr [1:70] "avg" "med" "jt_category_1" "jt_category_2" ...
..@ x : num [1:181356] 2392 2474 2392 2561 2763 ...
..@ factors : list()
why misrecognize that?
do u know how to perform regression with xgboost and caret?
thank you in advance,
full code is here:
library(caret)
library(xgboost)
xgb_grid_1 = expand.grid(
nrounds = 1000,
max_depth = c(2, 4, 6, 8, 10),
eta=c(0.5, 0.1, 0.07),
gamma = 0.01,
colsample_bytree=0.5,
min_child_weight=1,
subsample=0.5
)
xgb_trcontrol_1 = trainControl(
method = "cv",
number = 5,
verboseIter = TRUE,
returnData = FALSE,
returnResamp = "all", # save losses across all models
classProbs = TRUE, # set to TRUE for AUC to be computed
summaryFunction = twoClassSummary,
allowParallel = TRUE
)
xgb_train_1 = train(
x = as.matrix(train[ , 2:71]),
y = as.matrix(train$sal),
trControl = xgb_trcontrol_1,
tuneGrid = xgb_grid_1,
method = "xgbTree"
)
update(18.08.10)
when i delete two parameters (classProbs = TRUE, summaryFunction = twoClassSummary
) of trainControl
function, the result is the same...:
> xgb_grid_1 = expand.grid(
+ nrounds = 1000,
+ max_depth = c(2, 4, 6, 8, 10),
+ eta=c(0.5, 0.1, 0.07),
+ gamma = 0.01,
+ colsample_bytree=0.5,
+ min_child_weight=1,
+ subsample=0.5
+ )
>
> xgb_trcontrol_1 = trainControl(
+ method = "cv",
+ number = 5,
+ allowParallel = TRUE
+ )
>
> xgb_train_1 = train(
+ x = as.matrix(train[ , 2:71]),
+ y = as.matrix(train$sal),
+ trControl = xgb_trcontrol_1,
+ tuneGrid = xgb_grid_1,
+ method = "xgbTree"
+ )
Error: Metric RMSE not applicable for classification models
It's not strange that caret
thinks you are asking for classification, because you are actually doing so in these 2 lines of your trainControl
function:
classProbs = TRUE,
summaryFunction = twoClassSummary
Remove both these lines (so as they take their default values - see the function documentation), and you should be fine.
Notice also that AUC is only applicable to classification problems.
UPDATE (after comments): Seems that the target variable being integer causes the problem; convert it to double before running the model with
train$sal <- as.double(train$sal)