Search code examples
rlme4mixed-modelsnlmemarginal-effects

ggeffects::ggpredict() doesn't return population level prediction intervals for nlme::lme()


I'm trying to get population level prediction intervals (PI) from ggeffects:ggpredict() using type = "re" from an nlme:lme() model. ggpredict is not returning the expected data for the lme() model, while the equivalent lmer() model works fine. My data are autocorrelated repeated measures, so I need lme() with correlation = corAR1().

I'm not sure if this is an error, or if I'm just trying to do something for which the tools I'm using aren't designed?

library(lme4)
library(nlme)
library(ggeffects)

Data <- data.frame(
  Subject = factor(c(3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 
                     4, 4, 4, 4, 4, 4, 4, 4, 4, 
                     5, 5, 5, 5, 5, 5, 5, 5, 5, 
                     6, 6, 6, 6, 6, 6, 6, 6, 
                     7, 7, 7, 7, 7, 7, 7, 7, 
                     8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 
                     9, 9, 9, 9, 9, 9, 9, 9, 
                     13, 13, 13, 13, 13, 13, 13, 13, 
                     14, 14, 14, 14, 14, 14, 14, 14, 14, 
                     19, 19, 19, 19, 19, 19, 19)), 
  x = c(20.0, 28.5, 38.0, 47.5, 57.0, 66.5, 76.0, 85.5, 95.0, 100.0, 
           21.0, 31.5, 42.0, 53.0, 63.0, 73.5, 84.0, 95.0, 100.0, 
           20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 100.0, 
           22.0, 33.0, 44.0, 56.0, 67.0, 78.0, 89.0, 100.0, 
           21.5, 32.0, 43.0, 54.0, 65.0, 76.0, 86.5, 100.0, 
           20.0, 29.0, 38.5, 48.5, 58.0, 67.5, 77.0, 87.0, 96.5, 100.0, 
           23.0, 33.0, 44.0, 56.0, 67.0, 78.0, 89.0, 100.0, 
           23.5, 34.5, 46.5, 57.5, 69.5, 80.5, 92.5, 100.0, 
           20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 100.0, 
           25.0, 37.5, 50.0, 62.5, 75.0, 87.5, 100.0)/100,
  y = c(1.10, 1.00, 1.25, 1.60, 1.40, 1.20, 2.50, 4.60, 6.80, 10.40, 
          0.90, 1.00, 0.75, 0.90, 1.10, 1.70, 4.35, 9.95, 11.45, 
          1.20, 0.70, 1.30, 1.40, 0.70, 1.25, 2.30, 4.30, 8.20, 
          1.55, 1.15, 0.95, 1.10, 1.90, 3.25, 7.20, 14.30, 
          1.85, 2.00, 1.70, 2.00, 2.35, 3.30, 7.30, 12.10, 
          2.20, 1.95, 1.15, 1.55, 1.65, 3.00, 4.45, 9.05, 13.75, 15.85, 
          1.55, 1.20, 1.35, 1.60, 1.65, 4.70, 6.45, 10.80, 
          1.00, 0.90, 1.00, 1.10, 1.60, 3.60, 8.05, 12.30, 
          0.85, 1.00, 1.05, 1.00, 1.35, 2.00, 3.65, 6.75, 13.10, 
          2.25, 2.35, 2.40, 2.80, 4.90, 8.15, 13.50)
)

Model.lme4 <- lmer(
  y ~ x + (1 | Subject),
  data = Data
)

# first running lme() without autocorrelation
Model.nlme <- lme(
  fixed = y ~ x,
  random = ~ 1 | Subject, 
  data = Data,
)

# Expected data return fine from the lmer() model:
ggpredict(
  Model.lme4,
  terms = c("x [all]"),
  type = "re",
)
# Predicted values of y
# 
#    x | Predicted |         95% CI
# ---------------------------------
# 0.20 |     -1.09 | [-5.93,  3.74]
# 0.28 |     -0.08 | [-4.89,  4.73]
# 0.38 |      0.99 | [-3.80,  5.78]
# 0.47 |      2.06 | [-2.71,  6.84]
# 0.58 |      3.38 | [-1.39,  8.14]
# 0.67 |      4.51 | [-0.26,  9.28]
# 0.77 |      5.70 | [ 0.92, 10.48]
# 1.00 |      8.44 | [ 3.61, 13.27]
# 
# Adjusted for:
# * Subject = 0 (population-level)
# 
# Intervals are prediction intervals.

# When run on the lme() model, predicted values & PIs are missing:
ggpredict(
  Model.nlme,
  terms = c("x [all]"),
  type = "re",
)
# Predicted values of y
# 
#    x
# ----
# 0.20
# 0.28
# 0.38
# 0.47
# 0.58
# 0.67
# 0.77
# 1.00
# 
# Adjusted for:
# * Subject = 0 (population-level)
# 
# Intervals are prediction intervals.

If I use correlation = corAR1() it produces the same results as above. The same thing also happens if I explicitly call terms = c("x [all]", "Subject [0]")

When I add the intended autocorrelation structure, I get prediction & PI values, but only for the first level of the Subject factor:

Model.nlme <- lme(
  fixed = y ~ x,
  random = ~ 1 | Subject, 
  correlation = corAR1(form = ~ x | Subject),
  data = Data,
)

ggpredict(
  Model.nlme,
  terms = c("x [all]"),
  type = "re",
)
# Predicted values of y
# 
#    x | Predicted |         95% CI
# ---------------------------------
# 0.20 |      1.64 | [-1.44,  4.71]
# 0.28 |      2.82 | [-0.23,  5.87]
# 0.38 |      4.07 | [ 1.03,  7.12]
# 0.47 |      5.33 | [ 2.29,  8.36]
# 0.58 |      6.86 | [ 3.82,  9.90]
# 0.67 |      8.18 | [ 5.13, 11.24]
# 0.77 |      9.58 | [ 6.50, 12.66]
# 1.00 |     12.78 | [ 9.61, 15.95]
# 
# Adjusted for:
# * Subject = 3
# 
# Intervals are prediction intervals.

Am I making an error somewhere? Or is there a better way to get the PIs that I want? Thanks!


Solution

  • Unfortunately, I think you might be stuck.

    The underlying problem is that broom.mixed::tidy doesn't return the standard deviations of BLUPs:

    library(broom.mixed)
    tidy(Model.lme4, effects = "ran_vals") ## includes SDs
    tidy(Model.nlme, effects = "ran_vals") ## doesn't
    

    However, this isn't really broom.mixed's fault. The problem is that at present the nlme package doesn't offer any option to return the standard deviations of the BLUPs (see e.g. this [email protected] thread, which discusses the issue but peters out without providing a solution).

    In order to fix this, as best I can tell, you (or someone) would have to dig into the guts of nlme and figure out how to construct the SDs for the BLUPs. If I were doing it I would start with Bates et al 2015 equations 56-59 and see whether it's possible to extract the corresponding components from an lme fit. In particular see section 2.2 of Pinheiro and Bates 2000 for the notation and framework used in nlme: eq 2.22, p. 71, gives the expression for the BLUPs, hopefully one could match that up with the lme code and derive the corresponding expression for the SDs ...

    alternatively: you can fit an AR1 model in glmmTMB, which should be tidy-able/ggpredict-able. Code is shown below; I think these are equivalent models, but the example data you've given above doesn't support fitting an AR1 model [we get a singular fit], so I can't be sure.

    ## define a per-subject observation-number variable (factor)
    ## alt. tidyverse: Data |> group_by(Subject) |> mutate(obs = seq(n())) |>
    ##                  mutate(across(obs, factor))
    Data2 <- (Data
        |> split(Data$Subject)
        |> lapply(FUN = \(x) transform(x, obs = seq(x)))
        |> do.call(what = rbind)
        |> transform(obs = factor(obs))
    )
    Model.nlme.acf <- update(Model.nlme,
                             data = Data2,
                             correlation = corAR1(form = ~obs))
    library(glmmTMB)
    Model.glmmTMB <-  glmmTMB(y ~ x + (1|Subject) + ar1(0 + obs|Subject),
                              data = Data2)