Search code examples
rinteractionlme4

How to find original variables in merMod objects with interaction terms (fitted with lme4 package)?


I would like to calculate interaction effects, like shown in this blog posting, so I can calculate y = (b0 + (b1 * xa) + (b3 * xa * xb)), where b0 is the estimate of the intercept, b1 is the estimate of the predictor A and b3 is the estimate of the interaction between predictor A and B.

To do this, I need

  1. the estimates from the fitted model
  2. the original values of the predictors (i.e. the "data.frame" from the used variables, where each value is inserted into the formula shown above)

For "simple" generalized linear models, I can use the data matrix parameter to store the original data values in the fitted model object (glm(... x=TRUE). Example:

fit <- glm(f1care ~ c12hour + neg_c_7 + c172 + 
           sex1 + sex2 + c172:neg_c_7 + sex1:c12hour,
           data = mydf,
           x = TRUE,
           family = binomial("logit"))

Coefficients:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)    1.4674300  0.1759883   8.338  < 2e-16 ***
c12hour        0.0016157  0.0016315   0.990   0.3220    
neg_c_7       -0.0549614  0.0108868  -5.048 4.45e-07 ***
c1721         -0.1575547  0.2752344  -0.572   0.5670    
sex12         -0.0320619  0.1286984  -0.249   0.8033    
sex22          0.2091354  0.0874629   2.391   0.0168 *  
neg_c_7:c1721  0.0090043  0.0202720   0.444   0.6569    
c12hour:sex12 -0.0003121  0.0018246  -0.171   0.8642  

> head(fit$x)
  (Intercept) c12hour neg_c_7 c1721 sex12 sex22 neg_c_7:c1721 c12hour:sex12
1           1      10       9     0     1     1             0            10
2           1       4      13     1     1     1            13             4
3           1      12      21     0     1     1             0            12
4           1      60      14     0     1     1             0            60
5           1      40      17     0     1     1             0            40
6           1      50      17     0     1     1             0            50

As you can see, the coefficients' names are the same as the column names in the model data frame (fit$x).

If I find an interaction term (e.g. c12hour:sex12), I can split the name at the colon, have c12hour and sex12, and find the values via the model matrix column names (names are identical).

Now my question is, how to do this with merMod objects? The column names of the data frame used to fit the model (fit@frame) with original values looks like this:

library(lme4)
fit <- glmer(f1care ~ c12hour + neg_c_7 + c172 + 
             sex1 + sex2 + 
             c172:neg_c_7 + sex1:c12hour + (1|g2ctry),
             data = mydf,
             family = binomial("logit"))

> colnames(fit@frame)[-1]
[1] "c12hour" "neg_c_7" "c172"    "sex1"    "sex2"    "g2ctry" 

The coeffciients, accessed via summary or fixef, look like this:

Fixed effects:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)    1.2388127  0.2382648   5.199    2e-07 ***
c12hour        0.0018070  0.0016450   1.098 0.272018    
neg_c_7       -0.0387817  0.0115511  -3.357 0.000787 ***
c1721          0.1187357  0.2842743   0.418 0.676181    
sex12         -0.0305578  0.1306499  -0.234 0.815069    
sex22          0.1580400  0.0897806   1.760 0.078358 .  
neg_c_7:c1721 -0.0106958  0.0209961  -0.509 0.610458    
c12hour:sex12 -0.0009486  0.0018350  -0.517 0.605206   

> cbind(fixef(fit))
                       [,1]
(Intercept)    1.2388126568
c12hour        0.0018069626
neg_c_7       -0.0387817065
c1721          0.1187357405
sex12         -0.0305578499
sex22          0.1580400407
neg_c_7:c1721 -0.0106958049
c12hour:sex12 -0.0009485618

As you can see, the column names sex1 and sex2 are not idenctical to the coefficients "names" sex12 and sex22 (unlike in the first, simple glm example, where both are identical).

My current approach is

  • to take the column name (e.g. sex1)
  • check if it is a factor, and if so, retrieve levels (e.g. 1 and 2)
  • concat column name and factor level (so we have sex11 and sex12)
  • find the interaction term, which includes the concatenated "predictor name" (so sex11 will give no match, however, sex12 will match the second interaction term c12hour:sex12)

I wonder if this is the best or only solution, or if there might be cases where my way of "detecting" interaction terms won't work?


Solution

  • Ok, model.matrix does the trick and returns a data frame where column names correspond to the term names in the model summary.