Search code examples
rlme4

merTools predictInterval() for model with nested random effect


Does predictInterval() from the merTools package not like nested random effects? For example, using the msleep dataset from the ggplot2 package:

library("lme4")
library("merTools")
library("ggplot2")
mod <- lmer(sleep_total ~ bodywt + (1|vore/order), data=msleep)
predInt <- predictInterval(merMod=mod, newdata=msleep) 

Returns an error:

Error in '[.data.frame'(newdata, , j) : undefined columns selected

This runs fine no problem:

mod <- lmer(sleep_total ~ bodywt + (1|vore) + (1|order), data=msleep)
predInt <- predictInterval(merMod=mod, newdata=msleep)

(Well actually it gives a warning about NA levels in the random effect variables, but I'm not concerned about that)

UPDATE

As discussed in the comments of Ben Bolker's answer below, a new version of merTools accounts for nested random effects. However, when I try to predict for data that contains new levels of the nested random effect, I get errors.

This works:

mod <- lmer(sleep_total ~ bodywt + (1|vore/order), data=msleep)
predInt <- predictInterval(merMod=mod, newdata=msleep) 

And this works, albeit with a couple of warnings (see below for additional Q about the warnings*):

mod <- lmer(sleep_total ~ bodywt + (1|vore) + (1|order), data=msleep)
msleep2 <- msleep %>% mutate(vore = "omni")
predInt <- predictInterval(merMod=mod, newdata=msleep2) 

But this does not work:

mod <- lmer(sleep_total ~ bodywt + (1|vore/order), data=msleep)
msleep2 <- msleep %>% mutate(vore = "omni")
predInt <- predictInterval(merMod=mod, newdata=msleep2) 

With the following error:

Error in `[.data.frame`(tmp, alllvl) : undefined columns selected
In addition: Warning message:
In predictInterval(merMod = mod, newdata = msleep3) :
  newdata is tbl_df or tbl object from dplyr package and has been
              coerced to a data.frame

And here, "omni" isn't actually a new level of vore, but when combined with order, it creates new nested combinations of the variables.

If I use "new" or anything else that isn't an observed level of vore, I get similar results: It works for the non-nested version of the model, but not for the nested version.


*Also, should I be concerned about the warning given by the second model chunk above:

> mod <- lmer(sleep_total ~ bodywt + (1|vore) + (1|order), data=msleep)
> msleep2 <- msleep %>% mutate(vore = "omni")
> predInt <- predictInterval(merMod=mod, newdata=msleep2)
Warning messages:
  1: In predictInterval(merMod = mod, newdata = msleep2) :
     newdata is tbl_df or tbl object from dplyr package and has been
       coerced to a data.frame
  2: In chol.default(sigma, pivot = TRUE) :
     the matrix is either rank-deficient or indefinite

I'm going to guess the second one is a result of vore taking on the same value for each observation, but that shouldn't be an issue for prediction, should it? I could see it being an issue if the variable took on the same value when I was fitting the model, but don't think it should be an issue when predicting new observations?


Solution

  • One can (apparently) work around this by writing out the interaction term explicitly. Warning: I haven't actually checked to make sure the resulting predictions are correct, just seen that no error is produced and the resulting object is approximately sensible ...

    msleep <- transform(msleep,voreOrder=interaction(vore,order,drop=TRUE))
    mod2 <- lmer(sleep_total ~ bodywt + (1|vore)+(1|voreOrder), data=msleep)
    predInt <- predictInterval(merMod=mod2, newdata=msleep) 
    

    This does generate warning messages, but apparently they're due to <NA> values in the vore variable (I don't know this data set ...)