I use emmeans to derive adjusted means from my linear mixed-effect regression model, but the results do not seem to be correct. I want to plot the model fit and the adjusted values of the individual data points, but the results look weird:
The estimated adjusted means seems to be too high for Course A and too low on Course C. In my linear mixed-effect regression, I am predicting the posttest with pretest as a covariate and the main effect and interaction of Group and Course. Because I have repeated measures on Course and different testing conditions, I have included a random intercept for Course and School. Using emmeans I get the following estimates:
# model fit
CI_post <- lmer(
post.diff ~
pre.diff +
group * course
+ (1|bib)
+ (1|school),
data = dat,
REML = FALSE)
#estimated adjusted means
emmeans(CI_post, specs = c("course", "group"),lmer.df = "satterthwaite")
# Results
course group emmean SE df lower.CL upper.CL
A blocked 0.311 0.191 6.65 -0.1452 0.768
B blocked 0.649 0.180 5.38 0.1954 1.102
C blocked 1.141 0.195 7.28 0.6847 1.598
A interleaved 0.189 0.194 7.15 -0.2666 0.645
B interleaved 0.497 0.179 5.31 0.0451 0.949
C interleaved 1.046 0.191 6.72 0.5907 1.502
It is these values that I have plotted and that I think is incorrect. Can someone please help me so that I get the correct estimated adjusted means?
After having read this, I suspect that the error is because pre.diff is a fixed value?
ref_grid(CI_post)
#result
'emmGrid' object with variables:
pre.diff = 1.5065
group = blocked, interleaved
course = A, B, C
EDIT Following Lenth advice, I tried: post.diff.adj = post.diff + b * (1.506 - pre.diff), which gave me the following figure:
It looks better and more correct. I used the model regression coefficient from my model:
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) -0.66087 0.18158 5.58701 -3.639 0.012280 *
pre.diff 0.64544 0.06178 130.60667 10.448 < 0.0000000000000002 ***
groupinterleaved -0.12209 0.15189 65.38709 -0.804 0.424431
courseB 0.33714 0.09703 131.63603 3.475 0.000693 ***
courseC 0.82993 0.16318 151.09201 5.086 0.00000107 ***
groupinterleaved:courseB -0.02922 0.11777 101.47596 -0.248 0.804563
groupinterleaved:courseC 0.02692 0.11763 100.29319 0.229 0.819435
Then I used calculated it in my tibble:
dat <- dat %>%
mutate(adjustedMean = (post.diff) + (0.6454358 * (1.506 - pre.diff)))
Then I plotted it with ggplot:
CI_post_plot <- ggplot(dat, aes(x = interaction(group, course), y = adjustedMean)) +
geom_point(aes(color=group), size=1.5, position=position_jitter(width=0.1), alpha=0.7)+
scale_y_continuous(name = "Time substracted from straight gliding time (sec.)", breaks = seq(-2, 6, 1)) +
theme_pubr()+
theme(legend.position="none",
axis.title.x=element_blank()) +
geom_hline(aes(yintercept=0), linetype = "dashed", size=0.2) +
scale_x_discrete(labels = c("Blocked\nCourse A", "Interleaved\nCourse A", "Blocked\nCourse B", "Interleaved\nCourse B", "Blocked\nCourse C", "Interleaved\nCourse C"))
CI_post_plot <- CI_post_plot +
geom_point(data = estmarg_mean, aes(x=interaction(group, course), y=emmean, group=group), size=2.5) +
geom_errorbar(data = estmarg_mean, aes(x= interaction(group, course), y = emmean, ymin = lower.CL,ymax = upper.CL), width=0.1)
https://cran.r-project.org/web/packages/emmeans/vignettes/basics.html
Reviewing some comments, the second plot in the OP shows the adjusted response values and the adjusted means (AKA EMMs). The intuition for this is shown in this rough sketch: Most experimental design texts will show a similar picture for how adjusted means are objained: The model fits parallel lines for each treatment; those lines go through the centers of their respective data clouds. The adjusted means are the estimates at the mean value of the covariate.
To obtain the adjusted data, we do the same thing with each data point; a few representative points are shown. The adjusted data are projections of each data point onto the mean line, with the projections going on paths parallel to the regression lines.
There was an article on "aligned data" in The American Statistician a number of years ago, and this is akin to that. I do not remember the author and didn't find it in a quick search. This is also related to "component-plus-residual" plots, discussed in some regression texts. The underlying idea is to remove the estimated effects of nuisance variables from the data, or equivalently to obtain the model residuals and add the non-nuisance effects.