Search code examples
rlme4mixed-modelsmarginal-effects

Marginal effects for de-meaned polynomials in mixed models


In the mixed model (or REWB) framework it is common to model within changes by subtracting the cluster mean (demeaning) from a time varying x-variable, see eg. (Bell, Fairbrother & Jones, 2018). This estimator is basically the same as a fixed effects (FE) estimator (shown below using the sleepstudy data).

The issue arises when trying to model polynomials using the same principle. The equality between the estimators break when we enter our demeaned variable as a polynomial. We can restore this equality by first squaring the variable and then demeaning (see. re_poly_fixed).

dt <- lme4::sleepstudy
dt$days_squared <- dt$Days * dt$Days
dt <- cbind(dt, datawizard::demean(dt, select = c("Days", "days_squared"), group = "Subject"))

re <- lme4::lmer(Reaction ~ Days_within + (1 | Subject), data = dt, REML = FALSE)
fe <- fixest::feols(Reaction ~ Days | Subject, data = dt)

re_poly <- lme4::lmer(Reaction ~ poly(Days_within, 2, raw = TRUE) + (1 | Subject),
                      data = dt, REML = FALSE)

fe_poly <- fixest::feols(Reaction ~ poly(Days, 2, raw = TRUE) | Subject, data = dt)

re_poly_fixed <- lme4::lmer(Reaction ~ Days_within + days_squared_within + (1 | Subject),
                            data = dt, REML = FALSE)
models <-
  list("re" = re, "fe" = fe, "re_poly" = re_poly, "fe_poly" = fe_poly, "re_poly_fixed" = re_poly_fixed)

modelsummary::modelsummary(models)

The main issue with this strategy is that for postestimation, especially packages that calculate marginal effects (e.g. marginaleffects in R or margins in STATA) the variable needs to be entered as a polynomial term for the calculations to consider both x and x^2. That is using poly() or I() in R or factor notation c.x##c.x in STATA). The difference can be seen in the two calls below, where the FE-call returns one effect for "Days" and the manual call returns two separate terms.

(me_fe <- summary(marginaleffects::marginaleffects(fe_poly)))
(me_re <- summary(marginaleffects::marginaleffects(re_poly_fixed)))

I may be missing something obvious here, but is it possible to retain the equality between the estimators in FE and the Mixed model setups with polynomials, while still being able to use common packages for marginal effects?


Solution

  • The problem is that when a transformed variable is hardcoded, the marginaleffects package does not know that it should manipulate both the transformed and the original at the same time to compute the slope. One solution is to de-mean inside the formula with I(). You should be aware that this may make the model fitting less efficient.

    Here’s an example where I pre-compute the within-group means using data.table, but you could achieve the same result with dplyr::group_by():

    library(lme4)
    library(data.table)
    library(modelsummary)
    library(marginaleffects)
    
    dt <- data.table(lme4::sleepstudy)
    dt[, `:=`(Days_mean = mean(Days),
              Days_within = Days - mean(Days)),
        by = "Subject"]
    
    re_poly <- lmer(
        Reaction ~ poly(Days_within, 2, raw = TRUE) + (1 | Subject),
        data = dt, REML = FALSE)
    
    re_poly_2 <- lmer(
        Reaction ~ poly(I(Days - Days_mean), 2, raw = TRUE) + (1 | Subject),
        data = dt, REML = FALSE)
    
    models <- list(re_poly, re_poly_2)
    
    modelsummary(models, output = "markdown")
    
    Model 1 Model 2
    (Intercept) 295.727 295.727
    (9.173) (9.173)
    poly(Days_within, 2, raw = TRUE)1 10.467
    (0.799)
    poly(Days_within, 2, raw = TRUE)2 0.337
    (0.316)
    poly(I(Days - Days_mean), 2, raw = TRUE)1 10.467
    (0.799)
    poly(I(Days - Days_mean), 2, raw = TRUE)2 0.337
    (0.316)
    SD (Intercept Subject) 36.021 36.021
    SD (Observations) 30.787 30.787
    Num.Obs. 180 180
    R2 Marg. 0.290 0.290
    R2 Cond. 0.700 0.700
    AIC 1795.8 1795.8
    BIC 1811.8 1811.8
    ICC 0.6 0.6
    RMSE 29.32 29.32

    The estimated average marginal effects are – as expected – different:

    marginaleffects(re_poly) |> summary()
    #>          Term Effect Std. Error z value   Pr(>|z|) 2.5 % 97.5 %
    #> 1 Days_within  10.47     0.7989    13.1 < 2.22e-16 8.902  12.03
    #> 
    #> Model type:  lmerMod 
    #> Prediction type:  response
    
    marginaleffects(re_poly_2) |> summary()
    #>   Term Effect Std. Error z value   Pr(>|z|) 2.5 % 97.5 %
    #> 1 Days  10.47     0.7989    13.1 < 2.22e-16 8.902  12.03
    #> 
    #> Model type:  lmerMod 
    #> Prediction type:  response