I fitted two GLMM models (full models below), one with a Poisson distribution (number of eggs) and the other with binomial distribution (number of surviving offspring / number of eggs), and then I retrieved the full average coefficients (summary below).
I would like to plot the regression lines and confidence intervals for some of the fixed terms (from the averaged models) however I am really not sure if I am doing it the right way.
Below is what I did, and I noticed that #1 and #2 are a bit different. Given they come from the same model can someone give me some hints of what I am missing?
I am also wondering whether the fact that my models are standardized influences the way I should be plotting the regression lines?
This is what my df looks like:
> dput(head(dataRS.dist, 10))
structure(list(MaleID = c("875626", "871031",
"871031", "877334", "871895", "877906", "870404", "861285", "861285", "873932"
), FemaleID = c("861255", "861255", "861255", "861255", "861503",
"861503", "861525", "861525", "861525", "861621"), Year = c(1996L, 1997L,
1998L, 1999L, 1997L, 2000L, 1996L, 1997L, 1998L, 1998L), AgeM = c(1L,
5L, 6L, 2L, 2L, 1L, 3L, 3L, 4L, 3L), AgeF = c(2L, 3L, 4L, 5L, 3L, 6L, 3L,
4L, 5L, 3L), LayingDate = c(165L, 122L, 99L, 102L, 116L, 112L, 106L, 162L,
106L, 98L), LD_QUA = c(27225L, 14884L, 9801L, 10404L, 13456L, 12544L,
11236L, 26244L, 11236L, 9604L), Breed_Suc = c(1, 0.833, 1, 1, 0.667, 0.875,
0.6, 0.8, 0.6, 0.667), NbE = c(7L, 6L, 4L, 3L, 6L, 8L, 5L, 5L, 5L, 6L),
M_distance = c(5.011, 14.146, 14.146, 12.011, 8.36, 11.236, 11.135, 11.236,
11.236, 6.989), M_distance_QUA = c(25.112, 200.111, 200.111, 144.27, 69.882,
126.247, 123.984, 126.247, 126.247, 48.843), F_distance = c(1, 1, 1, 1,
6.438, 6.438, 10.371, 10.371, 10.371, 14.079), F_distance_QUA = c(1, 1, 1, 1,
41.45, 41.45, 107.553, 107.553, 107.553, 198.208), scale.LD = structure(c(1.49006909403421,
0.231226641474467, -0.44210769361563, -0.354281475995183, 0.0555742062335718,
-0.061527417260358, -0.237179852501253, 1.40224287641377, -0.237179852501253,
-0.471383099489113), .Dim = c(10L, 1L)), scale.LD_Quad = structure(c(1.50346602030487,
0.0806268589946397, -0.505410857311973, -0.43588877528701, -0.084012499432636,
-0.189160324982829, -0.339964443206132, 1.39036293163739, -0.339964443206132,
-0.528123709980161), .Dim = c(10L, 1L))), row.names = c(9L, 10L, 11L, 12L, 19L,
21L, 22L, 23L, 24L, 28L), class = "data.frame")
Model with poisson distribution:
full.model.dist.nb = glmer(NbE ~ M_distance + M_distance_QUA + F_distance +
F_distance_QUA + M_distance*F_distance + AgeF + AgeM + scale.LD +
scale.LD_Quad + (1|Year) + (1|MaleID) + (1|FemaleID), data = dataRS.dist,
na.action = na.fail, nAGQ = 0, family = "poisson"(link="log"))
MuMIn::dredge() ...
summary(nb.dist.avg.model)
Call:
model.avg(object = best.nb.dist.models)
Component model call:
glmer(formula = NbE ~ <5 unique rhs>, data = dataRS.dist, family =
poisson(link = "log"), nAGQ = 0, na.action = na.fail)
Component models:
df logLik AICc delta weight
1267 8 -1829.48 3675.12 0.00 0.27
12367 9 -1828.48 3675.16 0.05 0.26
123567 10 -1827.90 3676.05 0.93 0.17
12567 9 -1828.97 3676.14 1.02 0.16
123467 10 -1828.06 3676.37 1.26 0.14
Term codes:
z.AgeF z.AgeM z.F_distance z.F_distance_QUA
1 2 3 4
z.M_distance z.scale.LD z.scale.LD_Quad
5 6 7
Model-averaged coefficients:
(full average)
Estimate Std. Error Adjusted SE z value Pr(>|z|)
(Intercept) 1.827949 0.017495 0.017519 104.342 <2e-16 ***
z.AgeF 0.046835 0.028607 0.028646 1.635 0.1021
z.AgeM -0.053181 0.029231 0.029271 1.817 0.0692 .
z.scale.LD -0.173759 0.178627 0.178871 0.971 0.3313
z.scale.LD_Quad 0.230000 0.176936 0.177178 1.298 0.1942
z.F_distance -0.030771 0.046648 0.046679 0.659 0.5098
z.M_distance -0.009227 0.020272 0.020288 0.455 0.6492
z.F_distance_QUA 0.009579 0.036307 0.036336 0.264 0.7921
(conditional average)
Estimate Std. Error Adjusted SE z value Pr(>|z|)
(Intercept) 1.82795 0.01749 0.01752 104.342 <2e-16 ***
z.AgeF 0.04683 0.02861 0.02865 1.635 0.1021
z.AgeM -0.05318 0.02923 0.02927 1.817 0.0692 .
z.scale.LD -0.17376 0.17863 0.17887 0.971 0.3313
z.scale.LD_Quad 0.23000 0.17694 0.17718 1.298 0.1942
z.F_distance -0.05378 0.05065 0.05070 1.061 0.2888
z.M_distance -0.02808 0.02686 0.02689 1.044 0.2964
z.F_distance_QUA 0.06716 0.07331 0.07341 0.915 0.3603
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
For this model I plotted these:
stat_smooth
and method="glm"
seemed okay, especially cause I wanted to plot the CI as well. Is this correct? (the figure below de code shows the plot i did for the male and the female, separately)Model.avg.pred.dist.nb <- c(predict(nb.dist.avg.model, type = "response" )) #get the fitted values from the average model AICc<2
DF_Model.avg.pred.dist.nb <- data.frame(Model.avg.pred.dist.nb, subset(dataRS.dist, select = c(M_distance, F_distance, NbE))) #create the dataframe with the variables to plot
NbE_M_dist = ggplot(DF_Model.avg.pred.dist.nb, aes(x=M_distance, y=Model.avg.pred.dist.nb)) +
geom_point(size=0.5) +
stat_smooth(method = "glm", method.args = list(family = poisson), se = T, alpha = .25, color = "black") +
labs(x="Male distance", y = "Number eggs") +
theme_bw() +
theme(text=element_text(size=11))
geom_abline
. How can I add the CI as well?ggplot() +
scale_x_continuous(name="Distance", limits=c(0,20)) +
scale_y_continuous(name="Number eggs", limits=c(5,7.5)) +
scale_color_discrete(name="Sex") +
geom_abline(aes(intercept=exp(1.827949241), slope=-0.030771247, colour="Female")) +
geom_abline(aes(intercept=exp(1.827949241), slope=-0.009227452, colour="Male")) +
theme_bw()
For the binomial model, I did this (taken from here):
full.model.dist.bs = glmer(Breed_Suc ~ M_distance + M_distance_QUA + F_distance +
F_distance_QUA + M_distance*F_distance + AgeF + AgeM + scale.LD + scale.LD_Quad +
(1|Year) + (1|MaleID)+ (1|FemaleID), data = dataRS.dist, na.action = na.fail ,
nAGQ = 0, weights = dataRS.dist$NbE,family = "binomial"(link="logit"))
summary(bs.dist.avg.model)
Call:
model.avg(object = best.bs.dist.models)
Component model call:
glmer(formula = Breed_Suc ~ <6 unique rhs>, data = dataRS.dist, family =
binomial(link = "logit"), nAGQ = 0, weights = dataRS.dist$NbE, na.action =
na.fail)
Component models:
df logLik AICc delta weight
12356789 12 -1727.73 3479.82 0.00 0.25
125678 10 -1729.89 3480.03 0.21 0.22
123456789 13 -1726.96 3480.34 0.51 0.19
1278 8 -1732.43 3481.03 1.21 0.13
1235789 11 -1729.52 3481.33 1.51 0.12
12345789 12 -1728.72 3481.80 1.98 0.09
Term codes:
z.AgeF z.AgeM z.F_distance
1 2 3
z.F_distance_QUA z.M_distance z.M_distance_QUA
4 5 6
z.scale.LD z.scale.LD_Quad z.F_distance:z.M_distance
7 8 9
Model-averaged coefficients:
(full average)
Estimate Std. Error Adjusted SE z value Pr(>|z|)
(Intercept) 0.59571 0.08768 0.08780 6.785 <2e-16 ***
z.F_distance -0.08331 0.22438 0.22455 0.371 0.7106
z.M_distance 0.30106 0.35661 0.35686 0.844 0.3989
z.M_distance_QUA -0.38521 0.37397 0.37420 1.029 0.3033
z.AgeF 0.23875 0.11164 0.11180 2.136 0.0327 *
z.AgeM 0.07179 0.10853 0.10868 0.661 0.5089
z.scale.LD -0.21892 0.51109 0.51179 0.428 0.6688
z.scale.LD_Quad -0.21826 0.50536 0.50605 0.431 0.6662
z.F_distance:z.M_distance 0.23584 0.22598 0.22610 1.043 0.2969
z.F_distance_QUA 0.10247 0.22963 0.22979 0.446 0.6556
(conditional average)
Estimate Std. Error Adjusted SE z value Pr(>|z|)
(Intercept) 0.59571 0.08768 0.08780 6.785 <2e-16 ***
z.F_distance -0.12943 0.26879 0.26902 0.481 0.6304
z.M_distance 0.34787 0.36147 0.36176 0.962 0.3362
z.M_distance_QUA -0.58524 0.30888 0.30930 1.892 0.0585 .
z.AgeF 0.23875 0.11164 0.11180 2.136 0.0327 *
z.AgeM 0.07179 0.10853 0.10868 0.661 0.5089
z.scale.LD -0.21892 0.51109 0.51179 0.428 0.6688
z.scale.LD_Quad -0.21826 0.50536 0.50605 0.431 0.6662
z.F_distance:z.M_distance 0.36639 0.17748 0.17772 2.062 0.0392 *
z.F_distance_QUA 0.36356 0.30358 0.30400 1.196 0.2317
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
coefs <- coef(bs.dist.avg.model)
x_plot <- seq(0, 20, by = 0.1)
y_plot <- plogis(coefs[1] + coefs[2] * x_plot) # coefs[1]= 0.5957089; coefs[2]=-0.08330994
y_plot2 <- plogis(coefs[1] + coefs[3] * x_plot) # coefs[1]= 0.5957089; coefs[3]=0.3010592
plot_data <- data.frame(x_plot, y_plot,y_plot2)
ggplot(plot_data) +
geom_line(aes(x_plot, y_plot), col = "red") +
geom_line(aes(x_plot, y_plot2), col = "blue") +
xlab("Distance") + ylab("Breeding success") +
scale_y_continuous(limits = c(0, 1)) +
theme_bw()
If you have any suggestion on how to do it differently, it is very much welcome! Thanks in advance.
EDITED: with what I hope is a reproducible example (if not, I am sorry, then I don't understand what a reproducible example is)
I can sort of understand plotting the predicted values as points, although it might make more sense to show either the original data points or the partial residuals (e.g. the remef package on GitHub). Plotting a Poisson GLM through the predicted values is a little bit weird: while the GLM fit through the predicted values might be approximately the same as the predictions of the full model (although I'd have trouble proving it), but the confidence intervals are not the same as the confidence intervals on the predictions from the averaged model.
The prediction from a fitted log-link (Poisson) model is y = exp(a + b*x)
, not y=exp(a) + b*x
(which looks like what you're doing here).
This looks OK, although it will only give sensible predictions if all of your other covariates are zero-centred. (And you don't get confidence intervals, and there are easier ways to do this.)
You can use the ggeffects
package to get predictions and confidence intervals, with the caveat that the confidence intervals don't incorporate any uncertainty in the random effects (and the caveat that I haven't looked into how these are computed, so I'm not sure how they propagate the model-averaging uncertainty).
I used an interaction term in the model: as far as I can tell you set up explicit dummy variables for the interaction.
set.seed(101)
n <- 1000
ng <- 20
dd <- data.frame(x = rnorm(n),
y = rnorm(n),
sex = factor(rep(c("F","M"), length.out = n)),
f = factor(rep(1:ng, length.out = n)))
library(lme4)
library(MuMIn)
dd$resp <- simulate(~ x*sex + y + (1|f),
newdata = dd,
newparams = list(beta = rep(1,5),
theta = 1),
family = poisson)[[1]]
full_model <- glmer(resp ~ x*sex + y + (1|f), data = dd,
family = poisson, na.action = na.fail)
dr <- dredge(full_model)
avg_model <- model.avg(dr, fit = TRUE)
I tried the sjPlot
and ggeffects
packages but in the end found it easier to do what I wanted with emmeans
.
library(emmeans)
g1 <- emmeans(avg_model, specs = ~x*sex,
type = "response",
data = dd,
at = list(x = seq(-4, 4, length.out = 101)))
g2 <- as.data.frame(g1)
library(ggplot2); theme_set(theme_bw())
gg0 <- ggplot(dd, aes(x, resp, colour = sex)) +
## scale_y_log10(limits = c(0.1, 1e4)) +
geom_point() +
geom_line(data = g2, aes(y = rate)) +
geom_ribbon(data = g2, aes(
y = rate, ymin = lower.CL, ymax = upper.CL, fill = sex),
colour = NA, alpha = 0.3)
print(gg0)
print(gg0 + scale_y_log10(limits=c(0.1, 1e4), oob = scales::squish))