Search code examples
rlme4predictspline

Why does R's predict function throw an error for glmerMod models using bs splines, and how can I obtain model predictions in this case?


I am trying to obtain model fitted values using the predict function for a glmer model with splines::bs terms. The predict function throws this error:

Error in `[.data.frame`(fr, vars) : undefined columns selected

Here is a reproducible example using simulated data:

# Reproducible example of glmer spline predict problem

library(tibble)
library(dplyr)
library(splines)
library(lme4)

## Simulate data
set.seed(1234L)

n_groups <- 100L

num_per_group <- rpois(n_groups, 50L) + 1L

group_mean <- rnorm(n_groups, sd = 0.2)

x_vals <- c(-20:19)

simulated_data <- tibble(x = sample(x_vals,
                                    sum(num_per_group),
                                    replace = TRUE),
                         group_no = rep(c(1:n_groups),
                                        num_per_group),
                         group_mean = rep(group_mean,
                                          num_per_group)) %>%
  arrange(x, group_no) %>%
  mutate(lin_pred = 3 - 0.05*abs(x) + group_mean,
         y = exp(lin_pred + rnorm(sum(num_per_group),
                                  mean = 0, sd = 0.1)),
         y = round(y))

# Fit model

glmer_spline <- glmer(y ~ bs(x,
                             knots = c(0),
                             degree = 1L) +
                        (1 | group_no),
                      data = simulated_data,
                      family = poisson(link = "log"),
                      control = glmerControl(optimizer="bobyqa")
)

summary(glmer_spline)

# This throws an error:
# Error in `[.data.frame`(fr, vars) : undefined columns selected

prediction_output_splines <- predict(glmer_spline,
                                     newdata = simulated_data,
                                     type = "response",
                                     allow.new.levels = TRUE,
                                     re.form=~0)

I am using R version 4.2.3 (2023-03-15), tibble 3.2.1, dplyr 1.1.2, lme4 1.1-34, in RStudio "Mountain Hydrangea" Release (547dcf86, 2023-07-06) for macOS on a 2019 MacBook Pro running Ventura 13.4.1.

I would like to know whether there is a mistake in my code, whether I should be using a different function to obtain predictions, whether what I'm trying to do is somehow flawed, or whether there is a bug in one of the packages involved. Any help much appreciated.


Solution

  • TL;DR

    The problem comes from non-standard evaluation, which is tripped up by the fact that you are passing 1L instead of 1 to the degree argument of bs

    Explanation

    The error is actually thrown from inside model.frame.mermod, where the model frame has column names captured from your formula completely as-is:

    [1] "y"                             "bs(x, knots = 0, degree = 1L)"
    [3] "group_no" 
    

    Whereas the function tries to subset with variable names which have ultimately been obtained from stats::terms.formula. This puts the arguments through a C function, and in the process the L is removed from 1L, resulting in:

    [1] "y"                            "bs(x, knots = 0, degree = 1)"
    

    Since the second string doesn't match the second column name (due to the missing L), you get the "unselected column names selected" error.

    I guess you could call this a bug, but it would be quite a task to rewrite the code simply to cover the possibility that a user writes an unnecessary L after one of the numbers in the right hand side of their formula.

    Solution

    All you need to do is change 1L to 1, and everything works as expected:

    glmer_spline <- glmer(y ~ bs(x,
                                 knots = 0,
                                 degree = 1) +
                            (1 | group_no),
                          data = simulated_data,
                          family = poisson(link = "log"),
                          control = glmerControl(optimizer="bobyqa")
    )
    

    Then you get no errors when you run

    prediction_output_splines <- predict(glmer_spline,
                                         newdata = simulated_data,
                                         type = "response",
                                         allow.new.levels = TRUE,
                                         re.form=~0)
    

    Giving you

    head(prediction_output_splines)
    #>         1        2        3        4        5        6 
    #>  7.495069 7.495069 7.495069 7.495069 7.495069 7.495069