Search code examples
remmeans

Emmeans does not give me the correct adjusted means from the model


I use emmeans to derive adjusted means from my linear mixed-effect regression model, but the results do not seem to be correct. I want to plot the model fit and the adjusted values of the individual data points, but the results look weird:

ggplot

The estimated adjusted means seems to be too high for Course A and too low on Course C. In my linear mixed-effect regression, I am predicting the posttest with pretest as a covariate and the main effect and interaction of Group and Course. Because I have repeated measures on Course and different testing conditions, I have included a random intercept for Course and School. Using emmeans I get the following estimates:

# model fit
CI_post <- lmer(
  post.diff ~ 
    pre.diff +
    group * course 
  + (1|bib) 
  + (1|school), 
  data = dat, 
  REML = FALSE)

#estimated adjusted means
emmeans(CI_post, specs = c("course", "group"),lmer.df = "satterthwaite")

# Results
 course group       emmean    SE   df lower.CL upper.CL
 A      blocked      0.311 0.191 6.65  -0.1452    0.768
 B      blocked      0.649 0.180 5.38   0.1954    1.102
 C      blocked      1.141 0.195 7.28   0.6847    1.598
 A      interleaved  0.189 0.194 7.15  -0.2666    0.645
 B      interleaved  0.497 0.179 5.31   0.0451    0.949
 C      interleaved  1.046 0.191 6.72   0.5907    1.502

It is these values that I have plotted and that I think is incorrect. Can someone please help me so that I get the correct estimated adjusted means?

After having read this, I suspect that the error is because pre.diff is a fixed value?

ref_grid(CI_post)

#result
'emmGrid' object with variables:
    pre.diff = 1.5065
    group = blocked, interleaved
    course = A, B, C

EDIT Following Lenth advice, I tried: post.diff.adj = post.diff + b * (1.506 - pre.diff), which gave me the following figure:

adjusted

It looks better and more correct. I used the model regression coefficient from my model:

Fixed effects:
                          Estimate Std. Error        df t value             Pr(>|t|)    
(Intercept)               -0.66087    0.18158   5.58701  -3.639             0.012280 *  
pre.diff                   0.64544    0.06178 130.60667  10.448 < 0.0000000000000002 ***
groupinterleaved          -0.12209    0.15189  65.38709  -0.804             0.424431    
courseB                    0.33714    0.09703 131.63603   3.475             0.000693 ***
courseC                    0.82993    0.16318 151.09201   5.086           0.00000107 ***
groupinterleaved:courseB  -0.02922    0.11777 101.47596  -0.248             0.804563    
groupinterleaved:courseC   0.02692    0.11763 100.29319   0.229             0.819435 

Then I used calculated it in my tibble:


dat <- dat %>%
  mutate(adjustedMean = (post.diff) + (0.6454358 * (1.506 - pre.diff)))

Then I plotted it with ggplot:

CI_post_plot <- ggplot(dat, aes(x = interaction(group, course), y = adjustedMean)) +
  geom_point(aes(color=group), size=1.5, position=position_jitter(width=0.1), alpha=0.7)+
  scale_y_continuous(name = "Time substracted from straight gliding time (sec.)", breaks = seq(-2, 6, 1)) +
  theme_pubr()+
  theme(legend.position="none",
        axis.title.x=element_blank()) +
  geom_hline(aes(yintercept=0), linetype = "dashed", size=0.2) + 
  scale_x_discrete(labels = c("Blocked\nCourse A", "Interleaved\nCourse A", "Blocked\nCourse B", "Interleaved\nCourse B", "Blocked\nCourse C",  "Interleaved\nCourse C")) 

CI_post_plot <- CI_post_plot + 
  geom_point(data = estmarg_mean, aes(x=interaction(group, course), y=emmean, group=group), size=2.5) +
  geom_errorbar(data = estmarg_mean, aes(x= interaction(group, course), y = emmean, ymin = lower.CL,ymax = upper.CL), width=0.1)


https://cran.r-project.org/web/packages/emmeans/vignettes/basics.html


Solution

  • Reviewing some comments, the second plot in the OP shows the adjusted response values and the adjusted means (AKA EMMs). The intuition for this is shown in this rough sketch: covariate-adjustment diagram Most experimental design texts will show a similar picture for how adjusted means are objained: The model fits parallel lines for each treatment; those lines go through the centers of their respective data clouds. The adjusted means are the estimates at the mean value of the covariate.

    To obtain the adjusted data, we do the same thing with each data point; a few representative points are shown. The adjusted data are projections of each data point onto the mean line, with the projections going on paths parallel to the regression lines.

    There was an article on "aligned data" in The American Statistician a number of years ago, and this is akin to that. I do not remember the author and didn't find it in a quick search. This is also related to "component-plus-residual" plots, discussed in some regression texts. The underlying idea is to remove the estimated effects of nuisance variables from the data, or equivalently to obtain the model residuals and add the non-nuisance effects.