Search code examples
rggplot2lme4predictglmm

Plotting regression lines - full average GLMM


I fitted two GLMM models (full models below), one with a Poisson distribution (number of eggs) and the other with binomial distribution (number of surviving offspring / number of eggs), and then I retrieved the full average coefficients (summary below).

I would like to plot the regression lines and confidence intervals for some of the fixed terms (from the averaged models) however I am really not sure if I am doing it the right way.

Below is what I did, and I noticed that #1 and #2 are a bit different. Given they come from the same model can someone give me some hints of what I am missing?

I am also wondering whether the fact that my models are standardized influences the way I should be plotting the regression lines?

This is what my df looks like:

> dput(head(dataRS.dist, 10)) 
structure(list(MaleID = c("875626", "871031", 
"871031", "877334", "871895", "877906", "870404", "861285", "861285", "873932"
), FemaleID = c("861255", "861255", "861255", "861255", "861503", 
"861503", "861525", "861525", "861525", "861621"), Year = c(1996L, 1997L, 
1998L, 1999L, 1997L, 2000L, 1996L, 1997L, 1998L, 1998L), AgeM = c(1L, 
5L, 6L, 2L, 2L, 1L, 3L, 3L, 4L, 3L), AgeF = c(2L, 3L, 4L, 5L, 3L, 6L, 3L, 
4L, 5L, 3L), LayingDate = c(165L, 122L, 99L, 102L, 116L, 112L, 106L, 162L, 
106L, 98L), LD_QUA = c(27225L, 14884L, 9801L, 10404L, 13456L, 12544L, 
11236L, 26244L, 11236L, 9604L), Breed_Suc = c(1, 0.833, 1, 1, 0.667, 0.875, 
0.6, 0.8, 0.6, 0.667), NbE = c(7L, 6L, 4L, 3L, 6L, 8L, 5L, 5L, 5L, 6L), 
M_distance = c(5.011, 14.146, 14.146, 12.011, 8.36, 11.236, 11.135, 11.236, 
11.236, 6.989), M_distance_QUA = c(25.112, 200.111, 200.111, 144.27, 69.882, 
126.247, 123.984, 126.247, 126.247, 48.843), F_distance = c(1, 1, 1, 1, 
6.438, 6.438, 10.371, 10.371, 10.371, 14.079), F_distance_QUA = c(1, 1, 1, 1, 
41.45, 41.45, 107.553, 107.553, 107.553, 198.208), scale.LD = structure(c(1.49006909403421, 
0.231226641474467, -0.44210769361563, -0.354281475995183, 0.0555742062335718, 
-0.061527417260358, -0.237179852501253, 1.40224287641377, -0.237179852501253, 
-0.471383099489113), .Dim = c(10L, 1L)), scale.LD_Quad = structure(c(1.50346602030487, 
0.0806268589946397, -0.505410857311973, -0.43588877528701, -0.084012499432636, 
-0.189160324982829, -0.339964443206132, 1.39036293163739, -0.339964443206132, 
-0.528123709980161), .Dim = c(10L, 1L))), row.names = c(9L, 10L, 11L, 12L, 19L, 
21L, 22L, 23L, 24L, 28L), class = "data.frame")

Model with poisson distribution:

full.model.dist.nb = glmer(NbE ~ M_distance + M_distance_QUA + F_distance + 
    F_distance_QUA + M_distance*F_distance + AgeF + AgeM + scale.LD +
    scale.LD_Quad + (1|Year) + (1|MaleID) + (1|FemaleID), data = dataRS.dist, 
    na.action = na.fail, nAGQ = 0, family = "poisson"(link="log"))

MuMIn::dredge() ...

summary(nb.dist.avg.model)

Call:
model.avg(object = best.nb.dist.models)

Component model call: 
glmer(formula = NbE ~ <5 unique rhs>, data = dataRS.dist, family = 
     poisson(link = "log"), nAGQ = 0, na.action = na.fail)

Component models: 
       df   logLik    AICc delta weight
1267    8 -1829.48 3675.12  0.00   0.27
12367   9 -1828.48 3675.16  0.05   0.26
123567 10 -1827.90 3676.05  0.93   0.17
12567   9 -1828.97 3676.14  1.02   0.16
123467 10 -1828.06 3676.37  1.26   0.14

Term codes: 
           z.AgeF           z.AgeM     z.F_distance z.F_distance_QUA 
                1                2                3                4 
     z.M_distance       z.scale.LD  z.scale.LD_Quad 
                5                6                7 

Model-averaged coefficients:  
(full average) 
                  Estimate Std. Error Adjusted SE z value Pr(>|z|)    
(Intercept)       1.827949   0.017495    0.017519 104.342   <2e-16 ***
z.AgeF            0.046835   0.028607    0.028646   1.635   0.1021    
z.AgeM           -0.053181   0.029231    0.029271   1.817   0.0692 .  
z.scale.LD       -0.173759   0.178627    0.178871   0.971   0.3313    
z.scale.LD_Quad   0.230000   0.176936    0.177178   1.298   0.1942    
z.F_distance     -0.030771   0.046648    0.046679   0.659   0.5098    
z.M_distance     -0.009227   0.020272    0.020288   0.455   0.6492    
z.F_distance_QUA  0.009579   0.036307    0.036336   0.264   0.7921    
 
(conditional average) 
                  Estimate Std. Error Adjusted SE z value Pr(>|z|)    
(Intercept)       1.82795    0.01749     0.01752 104.342    <2e-16 ***
z.AgeF            0.04683    0.02861     0.02865   1.635    0.1021    
z.AgeM           -0.05318    0.02923     0.02927   1.817    0.0692 .  
z.scale.LD       -0.17376    0.17863     0.17887   0.971    0.3313    
z.scale.LD_Quad   0.23000    0.17694     0.17718   1.298    0.1942    
z.F_distance     -0.05378    0.05065     0.05070   1.061    0.2888    
z.M_distance     -0.02808    0.02686     0.02689   1.044    0.2964    
z.F_distance_QUA  0.06716    0.07331     0.07341   0.915    0.3603    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

For this model I plotted these:

  1. Here I first get the predicted values and then plot the regression line of the predicted values and my variable of interest, and also the CI . Given that I do it the predicted values, using a the stat_smooth and method="glm" seemed okay, especially cause I wanted to plot the CI as well. Is this correct? (the figure below de code shows the plot i did for the male and the female, separately)
Model.avg.pred.dist.nb <- c(predict(nb.dist.avg.model, type = "response" )) #get the fitted values from the average model AICc<2
        DF_Model.avg.pred.dist.nb <- data.frame(Model.avg.pred.dist.nb, subset(dataRS.dist, select = c(M_distance, F_distance, NbE))) #create the dataframe with the variables to plot
        NbE_M_dist = ggplot(DF_Model.avg.pred.dist.nb, aes(x=M_distance, y=Model.avg.pred.dist.nb)) +
          geom_point(size=0.5) + 
          stat_smooth(method = "glm", method.args = list(family = poisson), se = T, alpha = .25, color = "black") +
          labs(x="Male distance", y = "Number eggs") + 
          theme_bw() +
          theme(text=element_text(size=11))

Plot predicted nb eggs ~ sex distance

  1. I entered the intercept and slope values in geom_abline. How can I add the CI as well?
ggplot() +
  scale_x_continuous(name="Distance", limits=c(0,20)) +
  scale_y_continuous(name="Number eggs", limits=c(5,7.5)) +
  scale_color_discrete(name="Sex") +
  geom_abline(aes(intercept=exp(1.827949241), slope=-0.030771247, colour="Female")) + 
  geom_abline(aes(intercept=exp(1.827949241), slope=-0.009227452, colour="Male")) +
  theme_bw()

plot with intercept and slopes

For the binomial model, I did this (taken from here):

full.model.dist.bs = glmer(Breed_Suc ~ M_distance + M_distance_QUA + F_distance + 
    F_distance_QUA + M_distance*F_distance + AgeF + AgeM + scale.LD + scale.LD_Quad + 
    (1|Year) + (1|MaleID)+ (1|FemaleID), data = dataRS.dist, na.action = na.fail , 
    nAGQ = 0, weights = dataRS.dist$NbE,family = "binomial"(link="logit"))
summary(bs.dist.avg.model)

Call:
model.avg(object = best.bs.dist.models)

Component model call: 
glmer(formula = Breed_Suc ~ <6 unique rhs>, data = dataRS.dist, family = 
     binomial(link = "logit"), nAGQ = 0, weights = dataRS.dist$NbE, na.action = 
     na.fail)

Component models: 
          df   logLik    AICc delta weight
12356789  12 -1727.73 3479.82  0.00   0.25
125678    10 -1729.89 3480.03  0.21   0.22
123456789 13 -1726.96 3480.34  0.51   0.19
1278       8 -1732.43 3481.03  1.21   0.13
1235789   11 -1729.52 3481.33  1.51   0.12
12345789  12 -1728.72 3481.80  1.98   0.09

Term codes: 
                   z.AgeF                    z.AgeM              z.F_distance
                        1                         2                         3
         z.F_distance_QUA              z.M_distance          z.M_distance_QUA
                        4                         5                         6 
               z.scale.LD           z.scale.LD_Quad z.F_distance:z.M_distance
                        7                         8                         9 

Model-averaged coefficients:  
(full average) 
                           Estimate Std. Error Adjusted SE z value Pr(>|z|)
(Intercept)                0.59571    0.08768     0.08780   6.785    <2e-16 ***
z.F_distance              -0.08331    0.22438     0.22455   0.371    0.7106
z.M_distance               0.30106    0.35661     0.35686   0.844    0.3989
z.M_distance_QUA          -0.38521    0.37397     0.37420   1.029    0.3033
z.AgeF                     0.23875    0.11164     0.11180   2.136    0.0327 *  
z.AgeM                     0.07179    0.10853     0.10868   0.661    0.5089   
z.scale.LD                -0.21892    0.51109     0.51179   0.428    0.6688 
z.scale.LD_Quad           -0.21826    0.50536     0.50605   0.431    0.6662   
z.F_distance:z.M_distance  0.23584    0.22598     0.22610   1.043    0.2969 
z.F_distance_QUA           0.10247    0.22963     0.22979   0.446    0.6556 
 
(conditional average) 
                           Estimate Std. Error Adjusted SE z value Pr(>|z|) 
(Intercept)                0.59571    0.08768     0.08780   6.785    <2e-16 ***
z.F_distance              -0.12943    0.26879     0.26902   0.481    0.6304  
z.M_distance               0.34787    0.36147     0.36176   0.962    0.3362    
z.M_distance_QUA          -0.58524    0.30888     0.30930   1.892    0.0585 .  
z.AgeF                     0.23875    0.11164     0.11180   2.136    0.0327 *  
z.AgeM                     0.07179    0.10853     0.10868   0.661    0.5089    
z.scale.LD                -0.21892    0.51109     0.51179   0.428    0.6688    
z.scale.LD_Quad           -0.21826    0.50536     0.50605   0.431    0.6662    
z.F_distance:z.M_distance  0.36639    0.17748     0.17772   2.062    0.0392 *  
z.F_distance_QUA           0.36356    0.30358     0.30400   1.196    0.2317                       
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
coefs <- coef(bs.dist.avg.model)
x_plot <- seq(0, 20, by = 0.1)
y_plot <- plogis(coefs[1] + coefs[2] * x_plot) # coefs[1]=  0.5957089; coefs[2]=-0.08330994
y_plot2 <- plogis(coefs[1] + coefs[3] * x_plot) # coefs[1]=  0.5957089; coefs[3]=0.3010592
plot_data <- data.frame(x_plot, y_plot,y_plot2)

ggplot(plot_data) + 
  geom_line(aes(x_plot, y_plot), col = "red") + 
  geom_line(aes(x_plot, y_plot2), col = "blue") + 
  xlab("Distance") + ylab("Breeding success") +
  scale_y_continuous(limits = c(0, 1)) + 
  theme_bw()

plot loglink regression

If you have any suggestion on how to do it differently, it is very much welcome! Thanks in advance.

EDITED: with what I hope is a reproducible example (if not, I am sorry, then I don't understand what a reproducible example is)


Solution

  • approach 1

    I can sort of understand plotting the predicted values as points, although it might make more sense to show either the original data points or the partial residuals (e.g. the remef package on GitHub). Plotting a Poisson GLM through the predicted values is a little bit weird: while the GLM fit through the predicted values might be approximately the same as the predictions of the full model (although I'd have trouble proving it), but the confidence intervals are not the same as the confidence intervals on the predictions from the averaged model.

    approach 2

    The prediction from a fitted log-link (Poisson) model is y = exp(a + b*x), not y=exp(a) + b*x (which looks like what you're doing here).

    approach 3

    This looks OK, although it will only give sensible predictions if all of your other covariates are zero-centred. (And you don't get confidence intervals, and there are easier ways to do this.)

    You can use the ggeffects package to get predictions and confidence intervals, with the caveat that the confidence intervals don't incorporate any uncertainty in the random effects (and the caveat that I haven't looked into how these are computed, so I'm not sure how they propagate the model-averaging uncertainty).

    example

    set up data and model

    I used an interaction term in the model: as far as I can tell you set up explicit dummy variables for the interaction.

    set.seed(101)
    n <- 1000
    ng <- 20
    dd <- data.frame(x = rnorm(n),
                     y = rnorm(n),
                     sex = factor(rep(c("F","M"), length.out = n)),
                     f = factor(rep(1:ng, length.out = n)))
    library(lme4)
    library(MuMIn)
    dd$resp <- simulate(~ x*sex + y + (1|f),
                        newdata = dd,
                        newparams = list(beta = rep(1,5),
                                         theta = 1),
                        family = poisson)[[1]]
    
    
    full_model <- glmer(resp  ~ x*sex + y + (1|f), data = dd,
                        family = poisson, na.action = na.fail)
    
    dr <- dredge(full_model)
    avg_model <- model.avg(dr, fit = TRUE)
    

    predictions and plot

    I tried the sjPlot and ggeffects packages but in the end found it easier to do what I wanted with emmeans.

    library(emmeans)
    g1 <- emmeans(avg_model, specs = ~x*sex,
                  type = "response",
                  data = dd,
                  at = list(x = seq(-4, 4, length.out = 101)))
    g2 <- as.data.frame(g1)
    
    library(ggplot2); theme_set(theme_bw())
    gg0 <- ggplot(dd, aes(x, resp, colour = sex)) +
        ## scale_y_log10(limits = c(0.1, 1e4)) + 
        geom_point() +
        geom_line(data = g2, aes(y = rate)) +
        geom_ribbon(data = g2, aes(
                                   y = rate, ymin = lower.CL, ymax = upper.CL, fill = sex),
                    colour = NA, alpha = 0.3)
     
    print(gg0)
    print(gg0 + scale_y_log10(limits=c(0.1, 1e4), oob = scales::squish))