I'm running a regression model with multiple factor and continuous predictors. I need to follow up with multiple comparisons. Following this post, I'm able to run the emmeans
correctly, and I seem to get appropriate pairwise comparisons. However, I fail when I try to get the CLD output. Any suggestions would be welcome.
# part of my dataset
df.sub <- structure(list(Year = c(2014, 2014, 2014, 2014, 2014, 2014, 2014,
2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014,
2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014,
2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2015,
2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015,
2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015,
2015, 2015, 2015, 2015, 2015, 2015), Transect = structure(c(3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L), .Label = c("Transect1", "Transect2", "Transect3",
"Transect4"), class = "factor"), Dist = c(450, 450, 450,
450, 986, 986, 986, 986, 986, 1996, 1996, 1996, 1996, 4082, 4082,
4082, 72, 72, 72, 72, 72, 292, 292, 292, 292, 555, 555, 555,
555, 555, 1055, 1055, 1055, 1055, 1650, 1650, 1650, 1650, 1650,
450, 450, 450, 987, 987, 987, 1994, 1994, 1994, 4078, 4078, 4078,
120, 120, 120, 325, 325, 325, 560, 560, 560, 1070, 1070, 1070,
1070, 1650, 1650, 1650, 1650), Response = c(12000, 13000, 12000,
12000, 13000, 13000, 13000, 12000, 13000, 13000, 13000, 12000,
12000, 9600, 11000, 10000, 6100, 8400, 5500, 6100, 6300, 7200,
7200, 6800, 6700, 7800, 6800, 6400, 6000, 5700, 8300, 7900, 8400,
8200, 9000, 9900, 7900, 8100, 7600, 12000, 14000, 12000, 13000,
14000, 14000, 14000, 12000, 15000, 13000, 12000, 11000, 8400,
9600, 8700, 7300, 7300, 7100, 5900, 7100, 6500, 8600, 8100, 7800,
7400, 10000, 9800, 7500, 8500), Covariate = c(67, 49, 62, 70, 73,
60, 61, 68, 72, 54, 44, 43, 41, 52, 44, 47, 9.4, 18.3, 10.3,
14.4, 13.9, 14, 18.3, 10.7, 12, 23.4, 27.1, 11.6, 8.6, 8.8, 34.6,
36, 38, 30.7, 40.9, 41, 35.3, 25.7, 23.7, 73, 72, 72, 62, 73,
73, 59, 51, 63, 55, 50, 46, 20.9, 36.9, 24.5, 27.6, 29.4, 28,
14.5, 27.4, 17, 34.7, 38.8, 39, 34.1, 55.2, 56, 44.6, 35.9)), row.names = c(NA,
-68L), class = c("tbl_df", "tbl", "data.frame"))
Analysis:
library(dplyr)
library(tidyr)
library(emmeans)
# data adjustment
df.sub$TransDist <- log(df.sub$Dist + 1)
df.sub$YearFac <- as.factor(df.sub$Year)
df.sub$Transect <- droplevels(df.sub$Transect)
m <- lm(log(Response) ~ poly(TransDist, 2) * YearFac * Transect +
log(Covariate), data = df.sub)
# prediction grid:
new <- unique(select(df.sub, Transect, YearFac)) %>%
crossing(Dist = c(0, 500, 1000, 4000)) %>%
filter(Dist < 4000 | Transect != "Transect3") %>%
mutate(TransDist = log(Dist + 1),
Covariate = rnorm(n(), 50, 5))
termsX <- terms(model.frame(m, data = df.sub))
X_new2 <- model.matrix(delete.response(termsX), data = new)
beta = coef(m)
new$pred <- X_new2 %*% beta
# now predict with emmeans and compare
ems <- emmeans(m, ~YearFac | Transect * TransDist + Covariate, data = new,
covnest = TRUE, cov.reduce = FALSE)
as.data.frame(ems) %>%
select(Transect, YearFac, TransDist, emmean) %>%
right_join(new) ## when looking at the output, the values are identical, which is great
Now calculate pairwise comparisons. This works (based on manual comparison to output above).
contrast(ems, "pairwise", by = c("Transect", "TransDist"),
data = new, covnest = TRUE, cov.reduce = FALSE)
However, when I run CLD
, I get an error...
CLD(ems)
Error in x@linfct[i, , drop = FALSE] : subscript out of bounds
There is a very subtle error... Try:
> CLD(ems, sort = FALSE)
Transect = Transect2, TransDist = 0.00:
Covariate YearFac emmean SE df lower.CL upper.CL .group
45.6 2014 9.08 0.4854 55 8.11 10.06 1
44.1 2015 13.01 0.9365 55 11.13 14.89 2
Transect = Transect2, TransDist = 6.22:
Covariate YearFac emmean SE df lower.CL upper.CL .group
52.3 2014 9.11 0.0485 55 9.01 9.20 1
54.2 2015 9.01 0.0402 55 8.93 9.09 1
... etc. ...
Objects with nesting carry a complete reference grid internally, along with flags to determine which nodes are relevant. In this example, that complete grid has 224 elements, only 14 of which are to be displayed. The sorting code picks out 14 (wrong) rows, but later thinks there are still 224 because the flags are still active.
More explanation than is probably needed, but I have tracked that down and found a way to work with it. The next github push will have the correction. Also, I expect to update it on CRAN in the next week or two.