My question is about how handle missing values when using train for fitting models with caret. A small sample of my data would be like that:
df <- dput(dat)
structure(list(LagO3 = c(NA, NA, NA, 40, 45, NA), RH = c(69.4087524414062,
79.9608383178711, 64.4592437744141, 66.4207077026367, 66.0899200439453,
91.3353729248047), SR = c(298.928888888889, 300.128888888889,
303.688888888889, 304.521111111111, 303.223333333333, 294.716666666667
), ST = c(317.9917578125, 317.448253038194, 311.039059244792,
312.557927517361, 321.252841796875, 330.512212456597), Tmx = c(294.770359293045,
294.897191864461, 295.674552786042, 296.247345044048, 296.108238352818,
294.594430242372), CWTE = c(0, 1, 0, 0, 0, 0), CWTW = c(0, 0,
0, 0, 0, 0), o3 = c(NA, NA, NA, 52, 55, NA)), .Names = c("LagO3",
"RH", "SR", "ST", "Tmx", "CWTE", "CWTW", "o3"), row.names = c("1",
"2", "3", "4", "5", "6"), class = "data.frame")
The problem is that for several positions in one of my predictors I have NA and the predictand (o3) has also NA (but in different positions). Then, I tried:
model <- train(x = na.omit(x.training), y = na.omit(training$o3), method = "lmStepAIC",
direction="backward", trControl = control)
But, I would have different length for y ... I tried to use:
model <- train(x = x.training, y = training$o3,na.action=na.pass,
method = "lmStepAIC",direction="backward",trControl = control)
having the following error:
Error in quantile.default(y, probs = seq(0, 1, length = cuts)) : missing values and NaN's not allowed if 'na.rm' is FALSE
I would appreciate any suggestion!
Thanks a lot.
You need to use the na.action
argument with na.omit
of the train
function. As the documentation says for na.action
(type ?train
):
A function to specify the action to be taken if NAs are found. The default action is for the procedure to fail. An alternative is na.omit, which leads to rejection of cases with missing values on any required variable. (NOTE: If given, this argument must be named.)
So the following will work:
model <- train(x = x.training, y = training$o3,
method = "lmStepAIC",direction="backward",
trControl = control, na.action=na.omit)
Output:
> model <- train(x = x.training, y = y.training, method = "lmStepAIC",direction="backward",
+ na.action=na.omit)
Start: AIC=-129.7
.outcome ~ LagO3 + RH + SR + ST + Tmx + CWTE + CWTW
Step: AIC=-129.7
.outcome ~ LagO3 + RH + SR + ST + Tmx + CWTE
Step: AIC=-129.7
.outcome ~ LagO3 + RH + SR + ST + Tmx
Step: AIC=-129.7
.outcome ~ LagO3 + RH + SR + ST
Step: AIC=-129.7
.outcome ~ LagO3 + RH + SR
Step: AIC=-129.7
.outcome ~ LagO3 + RH
Step: AIC=-129.7
.outcome ~ LagO3
Step: AIC=-129.7
.outcome ~ 1
...
...
...