Search code examples
rlme4mixed-modelsposthocemmeans

glmmTMB, post-hoc testing and glht


I am using glmmTMB to analyze a negative binomial generalized linear mixed model (GLMM) where the dependent variable is count data (CT), which is over-dispersed.

There are 115 samples (rows) in the relevant data frame. There are two fixed effects (F1, F2) and a random intercept (R), within which is nested a further random effect (NR). There is also an offset, consisting of the natural logarithm of the total counts in each sample (LOG_TOT).

An example of a data frame, df, is:

CT  F1  F2  R   NR  LOG_TOT
77  0   0   1   1   12.9
167 0   0   2   6   13.7
289 0   0   3   11  13.9
253 0   0   4   16  13.9
125 0   0   5   21  13.7
109 0   0   6   26  13.6
96  1   0   1   2   13.1
169 1   0   2   7   13.7
190 1   0   3   12  13.8
258 1   0   4   17  13.9
101 1   0   5   22  13.5
94  1   0   6   27  13.5
89  1   25  1   4   13.0
166 1   25  2   9   13.6
175 1   25  3   14  13.7
221 1   25  4   19  13.8
131 1   25  5   24  13.5
118 1   25  6   29  13.6
58  1   75  1   5   12.9
123 1   75  2   10  13.4
197 1   75  3   15  13.7
208 1   75  4   20  13.8
113 1   8   1   3   13.2
125 1   8   2   8   13.7
182 1   8   3   13  13.7
224 1   8   4   18  13.9
104 1   8   5   23  13.5
116 1   8   6   28  13.7
122 2   0   1   2   13.1
115 2   0   2   7   13.6
149 2   0   3   12  13.7
270 2   0   4   17  14.1
116 2   0   5   22  13.5
94  2   0   6   27  13.7
73  2   25  1   4   12.8
61  2   25  2   9   13.0
185 2   25  3   14  13.8
159 2   25  4   19  13.7
125 2   25  5   24  13.6
75  2   25  6   29  13.5
121 2   8   1   3   13.0
143 2   8   2   8   13.8
219 2   8   3   13  13.9
191 2   8   4   18  13.7
98  2   8   5   23  13.5
115 2   8   6   28  13.6
110 3   0   1   2   12.8
123 3   0   2   7   13.6
210 3   0   3   12  13.9
354 3   0   4   17  14.4
160 3   0   5   22  13.7
101 3   0   6   27  13.6
69  3   25  1   4   12.6
112 3   25  2   9   13.5
258 3   25  3   14  13.8
174 3   25  4   19  13.5
171 3   25  5   24  13.9
117 3   25  6   29  13.7
38  3   75  1   5   12.1
222 3   75  2   10  14.1
204 3   75  3   15  13.5
235 3   75  4   20  13.7
241 3   75  5   25  13.8
141 3   75  6   30  13.9
113 3   8   1   3   12.9
90  3   8   2   8   13.5
276 3   8   3   13  14.1
199 3   8   4   18  13.8
111 3   8   5   23  13.6
109 3   8   6   28  13.7
135 4   0   1   2   13.1
144 4   0   2   7   13.6
289 4   0   3   12  14.2
395 4   0   4   17  14.6
154 4   0   5   22  13.7
148 4   0   6   27  13.8
58  4   25  1   4   12.8
136 4   25  2   9   13.8
288 4   25  3   14  14.0
113 4   25  4   19  13.5
162 4   25  5   24  13.7
172 4   25  6   29  14.1
2   4   75  1   5   12.3
246 4   75  3   15  13.7
247 4   75  4   20  13.9
114 4   8   1   3   13.1
107 4   8   2   8   13.6
209 4   8   3   13  14.0
190 4   8   4   18  13.9
127 4   8   5   23  13.5
101 4   8   6   28  13.7
167 6   0   1   2   13.4
131 6   0   2   7   13.5
369 6   0   3   12  14.5
434 6   0   4   17  14.9
172 6   0   5   22  13.8
126 6   0   6   27  13.8
90  6   25  1   4   13.1
172 6   25  2   9   13.7
330 6   25  3   14  14.2
131 6   25  4   19  13.7
151 6   25  5   24  13.9
141 6   25  6   29  14.2
7   6   75  1   5   12.2
194 6   75  2   10  14.2
280 6   75  3   15  13.7
253 6   75  4   20  13.8
45  6   75  5   25  13.4
155 6   75  6   30  13.9
208 6   8   1   3   13.5
97  6   8   2   8   13.5
325 6   8   3   13  14.3
235 6   8   4   18  14.1
112 6   8   5   23  13.6
188 6   8   6   28  14.1

The random and nested random effects are treated as factors. The fixed effect F1 has the value 0, 1, 2, 3, 4 and 6. The fixed effect F2 has the values 0, 8, 25 and 75. I am treating the fixed effects as continuous, rather than ordinal, because I would like to identify monotonic unidirectional changes in the dependent variable CT rather than up and down changes.

I previously used the lme4 package to analyze the data as a mixed model:

library(lme4)

m1 <- lmer(CT ~ F1*F2 + (1|R/NR) +
offset(LOG_TOT), data = df, verbose=FALSE)

Followed by the use of glht in the multcomp package for post-hoc analysis employing the formula approach:

library(multcomp)

glht_fixed1 <- glht(m1, linfct = c(
"F1 == 0",
"F1 + 8*F1:F2 == 0",
"F1 + 25*F1:F2 == 0",
"F1 + 75*F1:F2 == 0",
"F1 + (27)*F1:F2 == 0"))

glht_fixed2 <- glht(m1, linfct = c(
"F2 + 1*F1:F2 == 0",
"F2 + 2*F1:F2 == 0",
"F2 + 3*F1:F2 == 0",
"F2 + 4*F1:F2 == 0",
"F2 + 6*F1:F2 == 0",
"F2 + (3.2)*F1:F2 == 0"))

glht_omni <- glht(m1)

Here is the corresponding negative binomial glmmTMB model, which I now prefer:

library(glmmTMB)

m2 <- glmmTMB(CT ~ F1*F2 + (1|R/NR) + 
offset(LOG_TOT), data = df, verbose=FALSE, family="nbinom2")

According to this suggestion by Ben Bolker (https://stat.ethz.ch/pipermail/r-sig-mixed-models/2017q3/025813.html), the best approach to post hoc testing with glmmTMB is to use lsmeans (?or its more recent equivalent, emmeans).

I follwed Ben's suggestion, running

source(system.file("other_methods","lsmeans_methods.R",package="glmmTMB"))

and I can then use emmeans on the glmmTMB object. For example,

as.glht(emmeans(m2,~(F1 + 27*F1:F2)))

General Linear Hypotheses

Linear Hypotheses:
Estimate
3.11304347826087, 21 == 0 -8.813

But this does not seem correct. I can also change F1 and F2 to factors and then try this:

as.glht(emmeans(m2,~(week + 27*week:conc)))

     General Linear Hypotheses

Linear Hypotheses:
           Estimate
0, 0 == 0    -6.721
1, 0 == 0    -6.621
2, 0 == 0    -6.342
3, 0 == 0    -6.740
4, 0 == 0    -6.474
6, 0 == 0    -6.967
0, 8 == 0    -6.694
1, 8 == 0    -6.651
2, 8 == 0    -6.227
3, 8 == 0    -6.812
4, 8 == 0    -6.371
6, 8 == 0    -6.920
0, 25 == 0   -6.653
1, 25 == 0   -6.648
2, 25 == 0   -6.282
3, 25 == 0   -6.766
4, 25 == 0   -6.338
6, 25 == 0   -6.702
0, 75 == 0   -6.470
1, 75 == 0   -6.642
2, 75 == 0   -6.091
3, 75 == 0   -6.531
4, 75 == 0   -5.762
6, 75 == 0   -6.612

But, again, I am not sure how to bend this output to my will. If some kind person could tell me how to correctly carry over the use of formulae in glht and linfct to the emmeans scenario with glmmTMB, I would be very grateful. I have read all the manuals and vignettes until I am blue in face (or it feels that way, at least), but I am still at a loss. In my defense (culpability?) I am a statistical tyro, so many apologies if I am asking a question with very obvious answers here.

The glht software and post hoc testing carries directly over to the glmmADMB package, but glmmADMB is 10x slower than glmmTMB. I need to perform multiple runs of this analysis, each with 300,000 examples of the negative binomial mixed model, so speed is essential.

Many thanks for your suggestions and help!


Solution

  • Following the advice of my excellent statistical consultant, I think the solution below provides what I had previously obtained using glht and linfct.

    The slopes for F1 are calculated at the various levels of F2 by using contrast and emmeans to compute the differences in the dependendent variable between two values of F1 separated by one unit (i.e. c(0,1)). (Since the regression is linear, the two values of F1 are arbitrary, provided they are separated by one unit, eg c(3,4)). Vice versa for the slopes of F2.

    Thus, slopes of F1 at F2 = 0, 8, 25, 75 and 27 (27 is average of F2):

    contrast(emmeans(m1, specs="F1", at=list(F1=c(0,1), F2=0)),list(c(-1,1)))
            (above equivalent to: summary(m1)$coefficients$cond["F1",])
            contrast(emmeans(m1, specs="F1", at=list(F1=c(0,1), F2=8)),list(c(-1,1)))
            contrast(emmeans(m1, specs="F1", at=list(F1=c(0,1), F2=25)),list(c(-1,1)))
            contrast(emmeans(m1, specs="F1", at=list(F1=c(0,1), F2=75)),list(c(-1,1)))
            contrast(emmeans(m1, specs="F1", at=list(F1=c(0,1), F2=27)),list(c(-1,1)))
    

    and slopes of F2 at F1 = 1, 2, 3, 4, 6 and 3.2 (3.2 is average of F1, excluding zero value):

    contrast(emmeans(m1, specs="F2", at=list(F2=c(0,1), F1=0)),list(c(-1,1)))
    (above equivalent to: summary(m1)$coefficients$cond["F2",])
    contrast(emmeans(m1, specs="F2", at=list(F2=c(0,1), F1=1)),list(c(-1,1)))
    contrast(emmeans(m1, specs="F2", at=list(F2=c(0,1), F1=2)),list(c(-1,1)))
    contrast(emmeans(m1, specs="F2", at=list(F2=c(0,1), F1=3)),list(c(-1,1)))
    contrast(emmeans(m1, specs="F2", at=list(F2=c(0,1), F1=4)),list(c(-1,1)))
    contrast(emmeans(m1, specs="F2", at=list(F2=c(0,1), F1=6)),list(c(-1,1)))
    contrast(emmeans(m1, specs="F2", at=list(F2=c(0,1), F1=3.2)),list(c(-1,1)))
    

    Interaction of F1 and F2 slopes at F1 = 0 and F2 = 0

    contrast(emmeans(m1, specs=c("F1","F2"), at=list(F1=c(0,1),F2=c(0,1))),list(c(1,-1,-1,1)))
    (above equivalent to: summary(m1)$coefficients$cond["F1:F2",])
    

    From the resulting emmGrid objects provided from contrast(), one can pick out as desired the estimate of the slope (estimate), standard deviation of the estimated slope (SE), Z score for the difference of the estimated slope from a null hypothesized slope of zero (z.ratio, calculated by emmGrid from estimate divided by SE) and corresponding P value (p.value calculated by emmGrid as 2*pnorm(-abs(z.ratio)).

    For example:

    contrast(emmeans(m1, specs="F1", at=list(F2=c(0,1), F1=0)),list(c(-1,1)))
    

    yields:

    NOTE: Results may be misleading due to involvement in interactions
     contrast    estimate          SE df z.ratio p.value
     c(-1, 1) 0.001971714 0.002616634 NA   0.754  0.4511
    

    Postscript added 1.25 yrs later:

    The above gives the correct solutions, but as Russell Lenth pointed out the answers are more easily obtained using emtrends. However, I have selected this answer as being correct since there may be have some didactic value in showing how to calculate slopes using emmeans to find the resulting change in the predicted dependent variable when the independent variable changes by 1.