I'm not sure if this is more a programming or statistical (i.e. my lack of understanding) question.
I have a Poisson mixed model that I want to use to compare average counts across groups at different time periods.
mod <- glmer(Y ~ TX_GROUP * time + (1|ID), data = dat, family = poisson)
mod_em <- emmeans(mod, c("TX_GROUP","time"), type = "response")
TX_GROUP time rate SE df asymp.LCL asymp.UCL
0 1 5.743158 0.4566671 Inf 4.914366 6.711723
1 1 5.529303 0.4639790 Inf 4.690766 6.517741
0 2 2.444541 0.2981097 Inf 1.924837 3.104564
1 2 1.467247 0.2307103 Inf 1.078103 1.996855
0 3 4.570218 0.4121428 Inf 3.829795 5.453790
1 3 1.676827 0.2472920 Inf 1.255904 2.238826
Now, I want to estimate the marginal count for the combined time period (2 + 3) for each group. Is it not a simple case of exponentiating the sum of the logged counts from:
contrast(mod_em, list(`2 + 3` = c(0, 0, 1, 0, 1, 0)))
contrast(mod_em, list(`2 + 3` = c(0, 0, 0, 1, 0, 1)))
If I try that the value does not come close to matching the simple mean of the combined groups.
First, I suggest that you put both of your contrasts in one list, e.g.,
contr = list(`2+2|0` = c(0, 0, 1, 0, 1, 0),
`2+3|1` = c(0, 0, 0, 1, 0, 1))
You have to decide when you want to back-transform. See the vignette on transformations and note the discussion on "timing is everything". The two basic options are:
One option: Obtain the marginal means of the log counts, and then back-transform:
mod_con = update(contrast(mod_emm, contr), tran = "log")
summary(mod_con, type = "response")
[The update
call is needed because contrast
strips off transformations except in special cases, because it doesn't always know what scale to assign to arbitrary linear functions. For example, the difference of two square roots is not on a square-root scale.]
Second option: Back-transform the predictions, then sum them:
mod_emmr = regrid(mod_emm)
contrast(mod_emmr, contr)
The distinction between these results is the same as the distinction between a geometric mean (option 1) and an arithmetic mean (option 2). I doubt that either of them will yield the same results as the raw marginal mean counts, because they are based on the predictions from your model. Personally, I think the first option is the better choice, because sums are a linear operation, and the model is linear on the log scale.
There is actually a third option, which is to create a grouping variable. I will illustrate with the pigs
dataset.
> pigs.lm <- lm(log(conc) ~ source + factor(percent), data = pigs)
Here are the EMMs for percent
:
> emmeans(pigs.lm, "percent")
percent emmean SE df lower.CL upper.CL
9 3.445307 0.04088810 23 3.360723 3.529890
12 3.624861 0.03837600 23 3.545475 3.704248
15 3.662706 0.04372996 23 3.572244 3.753168
18 3.745156 0.05296030 23 3.635599 3.854713
Results are averaged over the levels of: source
Results are given on the log (not the response) scale.
Confidence level used: 0.95
Now let's create a grouping factor group
:
> pigs.emm = add_grouping(ref_grid(pigs.lm), "group", "percent", c("1&2","1&2","3&4","3&4"))
> str(pigs.emm)
'emmGrid' object with variables:
source = fish, soy, skim
percent = 9, 12, 15, 18
group = 1&2, 3&4
Nesting structure: percent %in% group
Transformation: “log”
Now get the EMMs for group
and note they are just the averages of the respective levels:
> emmeans(pigs.emm, "group")
group emmean SE df lower.CL upper.CL
1&2 3.535084 0.02803816 23 3.477083 3.593085
3&4 3.703931 0.03414907 23 3.633288 3.774574
Results are averaged over the levels of: source, percent
Results are given on the log (not the response) scale.
Confidence level used: 0.95
And here is a summary on the response scale:
> summary(.Last.value, type = "response")
group response SE df lower.CL upper.CL
1&2 34.29790 0.961650 23 32.36517 36.34605
3&4 40.60662 1.386678 23 37.83703 43.57893
Results are averaged over the levels of: source, percent
Confidence level used: 0.95
Intervals are back-transformed from the log scale
These are averages rather than sums, but otherwise it works, and the transformation doesn't get zapped like it does in contrast()