Search code examples
rcross-validationr-caretr-recipes

Caret's train.recipe seems to not apply recipe procedure to remove NAs and subsequently cross-validation fails


The caret package seems to not apply the recipe procedure to remove NAs for cross-validation. I guess that I overlook something...

iris_dt <- as.data.table(iris)
iris_dt[3:5,':='(Petal.Length=NA)]
control <- trainControl(method='cv',number=2,allowParallel = T)
rec <- recipe(Petal.Length ~ Sepal.Width,iris_dt) %>% step_naomit(all_outcomes(),all_predictors())
train(rec,iris_dt,method='lm',trControl = control)

Error in quantile.default(y, probs = seq(0, 1, length = cuts)) : missing values and NaN's not allowed if 'na.rm' is FALSE

It does also not work when the regressor is NA but gives a different error message. When data is prepared and baked and passed to the x/y interface of train(.) it works.

Many thanks for any hints.


Solution

  • The recipe works fine but the resamples are create prior to the recipe being used. You should remove them prior to calling train or use the formula method

    > iris_dt <- as.data.table(iris)
    > iris_dt[3:5,':='(Petal.Length=NA)]
    > control <- trainControl(method='cv',number=2,allowParallel = T)
    > rec <- recipe(Petal.Length ~ Sepal.Width,iris_dt) %>% step_naomit(all_outcomes(),all_predictors())
    > train(Petal.Length ~ Sepal.Width,iris_dt,method='lm',trControl = control, na.action = na.omit)
    Linear Regression 
    
    150 samples
      1 predictor
    
    No pre-processing
    Resampling: Cross-Validated (2 fold) 
    Summary of sample sizes: 74, 73 
    Resampling results:
    
      RMSE      Rsquared   MAE     
      1.610659  0.1885815  1.363651
    
    Tuning parameter 'intercept' was held constant at a value of TRUE