Cannot run glmer models with na.action=na.fail, necessary for MuMIn dredge function

Mac OS 10.9.5, R 3.2.3, MuMIn_1.15.6, lme4_1.1-10

Reproducible example code, using example data

The MuMIn user guide recommends using na.action=na.fail, otherwise the dredge function will not work, which I have found:

Error in dredge: 'global.model''s 'na.action' argument is not set and options('na.action') is "na.omit".

However, when I try to run a glmer model with na.action=na.fail, I get this:

Error in na.fail.default(list(pr = c(0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, : missing values in object

Do I have any options other than removing every observation with an NA? My full data set consists of 10,000 observations and has 23 predictor variables which have NAs for different observations. Removing every obs with an NA will waste some data, which I'm looking to avoid.

Solution

It is difficult to know what you are asking.

From ?MuMIn::dredge "Use of na.action = "na.omit" (R's default) or "na.exclude" in global.model must be avoided, as it results with sub-models fitted to different data sets, if there are missing values. Error is thrown if it is detected."

In your example, leaving the default options(na.action = na.omit) works fine:

options()$na.action
mod.na.omit <- glmer(formula = pr ~ yr + soil_dist + sla_raw + 
                yr:soil_dist + yr:sla_raw + (1|plot) + (1|subplot),
                     data = coldat,
                     family = binomial)

But, options(na.action = na.fail) causes glmer to fail (as expected from the documentation).

If you look at the length of the data in coldat, complete cases of coldat, mod.na.omit you get the following:

> # number of rows in coldat
> nrow(coldat)
[1] 3171

> # number of complete cases in coldat
> nrow(coldat[complete.cases(coldat), ])
[1] 2551

> # number of rows in data included in glmer model when using 'na.omit'
> length(mod.na.omit@frame$pr)
[1] 2551

From the example data you provided, complete cases of coldat and the rows of coldat included by glmer when using na.omit (mod.na.omit@frame) yields the same number of rows, but it is conceivable that as predictors are added, this may no longer be the case (i.e., number of rows in mod.na.omit@frame > complete cases of coldat). In this scenario (as the documentation states), there is a risk of sub-models being fitted to different data sets as dredge generates the models. So, rather than potentially fitting sub-models, dredge takes a conservative approach to NA, and throws an error.

So, you basically either have to remove the incomplete cases (which you indicated is something you don't want to do) or interpolate the missing values. I typically avoid interpolation if there are large blocks of missing data which make estimating a value fraught, and remove incomplete cases instead.