Search code examples
rpredictlme4mixed-modelsrandom-effects

prediction for lmer-model with nested random effects


I am currently trying to help a colleague and I simply cannot find a solution. So I am hoping that someone else might be able to help us.

I have a dataset containing weight data assessed with different study designs, for different species in different studies (a study included multiple designs and multiple species). I want to investigate the relation between the weight and the study design, using study and species as nested random effect.

The model looks like this and runs fine:

m <- lmer(weight ~ design +(1|study/species), data=dataset)

I tried to make predictions for the different species but with a generic study: I made a new data.table new.dt that contains the unique design-species-combinations of the original dataset and added a column for the report.

new.dt <- unique(dataset[,.(design, species))
new.dt$study <- "xyz"

Then I used the predict-function and allowed new levels.

new.dt$p <- predict(m, newdata=new.dt, re.form= NULL, allow.new.levels=TRUE) 

I don't get an error, but I get the same prediction for every species in a design.

Is there a way to keep the original levels of one part of the nested random effect and make the other part a new level?

Thank you in advance!

UPDATE - working example: This issue is not dependent on the dataset.

library(data.table)
library(lme4)

dt <- data.table(expand.grid(design=c("a", "b"), species=c("x", "y", "z"), report=c("1", "2", "3"), count=seq(1, 10, 1)))
dt$weight <- 0

dt[species=="x"]$weight <- rnorm(60, 70, 10)
dt[species=="y"]$weight <- rnorm(60, 80, 15)
dt[species=="z"]$weight <- rnorm(60, 90, 20)

dt[design=="a"]$weight <- dt[design=="a"]$weight- 0.1*dt[design=="a"]$weight

dt[report=="1"]$weight <- dt[report=="1"]$weight+0.15*dt[report=="1"]$weight
dt[report=="2"]$weight <- dt[report=="2"]$weight-0.15*dt[report=="1"]$weight

m <-lmer(weight~design+(1|report/species), data=dt)

dt.pred <- unique(dt[,c(1:2)])
dt.pred$report<- "xyz"
dt.pred$pred<-predict(m, newdata=dt.pred, re.form= NULL, allow.new.levels=TRUE) 

Solution

  • The 'sameness' comes from the fact that you are setting re.form = NULL or equivalently re.form = ~ 0.

    The Linear mixed effect model models Y|beta,b ~ intercept + X %*% beta + Z %*% b + e, and by setting re.form = NULL you are setting the definition of Z %*% b = 0 during your prediction. As this is the random part of your model, ( i.e. (1|report/species) ) you are removing the random effect of species and report.

    In mixed models you would call this kind of prediction "unconditional prediction" (or marginal prediction) [while it is more pseudo-unconditional in practice]. Often it is used in models where the random effect contains individual. In this case, when you observe a new individual you have an unknown random effect, but depending on your study you might only be interested in the "systematic" or "fixed" effect (i.e. did the individual walk to work before getting hit by the car? Did he bike?). Here it makes sense to look only at the unconditional/marginal effect.

    Said another way, by setting re.form = NULL you are saying Z %*% b = 0. As species is part of Z with weights b, you are unable to see the species-specific effect on your prediction.

    Only if you know the species and can use the random effects in your prediction will you get different prediction across species with the same fixed effects.

    Ps. The data.table package has an equivalent function to expand.grid called CJ, which will for larger sets is quite a bit faster and more memory efficient.