Search code examples
rggplot2lm

Can I add a randoma effect to stat_poly_eq() in ggplot?


I am interested in adding a random effect to a ggplot using stat_poly_eq().

library (ggplot2)
library(ggpmisc)

data (mtcars)
mtcars$gear   <- factor (mtcars$gear)
mtcars$cyl    <- factor (mtcars$cyl)
transmission <- data.frame ( code = c(1, 0),
                             type = c("manual", "automatic"))
mtcars$transmission <- setNames (transmission$type, transmission$code) [as.character (mtcars$am)]

my.formula <- y ~ x 

p <- ggplot       (data = mtcars, aes(x=hp, y=mpg, group = 1)) +
  facet_wrap   (~transmission, strip.position = "top", scales = "fixed", nrow = 1) +
  ggtitle      ("car horsepower (hp) to effiency (mpg) based on cylinder and number of gears for American vs others cars") +
  geom_point   ( size = 6, shape = 16, aes ( shape=cyl, color=cyl) ) +
  scale_color_manual (name       = "number of cylinder",
                      labels     = c (paste (levels(mtcars$cyl), "cylinders")),
                      values     = c("turquoise", "orange", "red")) +
  
  geom_smooth  (data = mtcars, method="lm",
                formula = my.formula,
                se = FALSE,
                color = "blue",
                fill = "blue") +
  
  stat_poly_eq (formula = my.formula,
                aes(label = paste(after_stat(eq.label), after_stat(adj.rr.label), after_stat(p.value.label), sep = "~~~")),
                parse = TRUE,
                size = 4,
                col = "black") +
  
  theme        (panel.spacing    = unit(0, "lines"),
                strip.background = element_blank(),
                plot.title       = element_text(hjust = 0.5,
                                                color = "black",
                                                size  = 14,
                                                face  = "bold.italic"),
                strip.placement = "outside")
print(p)

working graph before random effect

I'd assume blocking and adding random effects would be the same as for any linear model (+ Block OR + (1 | random.effect). However, I am not getting the code to work. Does anyone have any good ideas?

Here is an example of what I am talking about using mtcars. In this case I'll ask you to imagine that the vehicle weight is a random effect (I understand that it is likely not random effect but the example dataset doesn't include a good random effect. Here is the code I use to add the random effect and the regression line fails.

my.formula <- y ~ x + (1 | mtcars$wt)


Solution

  • When you use method = "lm" in geom_smooth, you are using the function lm to create a simple linear model on each subgroup of points in each panel. ggplot will split the data frame into multiple smaller data frames to do this. You cannot pass an entire column of the original data to your formula, since it will contain all the rows of the data frame, and hence be a different length from all the other variables in the subgroup being analysed.

    Furthermore, ggplot uses the predict function to get predicted values for your regression. It uses the range of the x axis to generate the values at which to predict. If you use variables other than y and x in the formula to geom_smooth, it will not know what values you want for these variables in the predict function, and will simply not work. You can therefore only use y and x in the formula passed to geom_smooth. This means it isn't suitable for mixed effect models, which by definition use more than one variable on the right hand side.

    To make matters worse, the (1 | wt) syntax does not create a random intercept mixed effects model inside lm. If you want to create a mixed-effects model, you need to use a specifically-designed function like lmer from the lme4 package.

    For both the above reasons, stat_poly_eq will not work in this scenario. It is designed only to work with lm regressions.

    However, all is not lost. It is still possible to get the plot you want using ggplot as long as we work through the necessary steps.

    First, let's create a data set that lends itself to a mixed effects model:

    set.seed(1)
    
    df <- data.frame(
      x = rep(1:10, 5) + runif(50), 
      y = rep(1:10, 5) + rep(rnorm(5, 5, 5), each = 10) + rnorm(50),
      z = rep(LETTERS[1:5], each = 10)
    )
    

    We can load lme4 and create our model with a random intercept

    library(lme4)
    #> Loading required package: Matrix
    
    model <- lmer(y ~ x + (1|z), data = df)
    
    model
    #> Linear mixed model fit by REML ['lmerMod']
    #> Formula: y ~ x + (1 | z)
    #>    Data: df
    #> REML criterion at convergence: 161.1372
    #> Random effects:
    #>  Groups   Name        Std.Dev.
    #>  z        (Intercept) 3.3242  
    #>  Residual             0.9568  
    #> Number of obs: 50, groups:  z, 5
    #> Fixed Effects:
    #> (Intercept)            x  
    #>      2.8896       0.9926
    

    Now, let's say we want to plot the implied regression lines for each group of our data, as well as the overall fixed effect in ggplot. We can start by creating a data frame of x and z values for which we want to predict the y values. We then feed this to predict and store the result:

    plot_df <- data.frame(x = rep(1:10, 5), z = rep(LETTERS[1:5], each = 10))
    plot_df$y <- predict(model, newdata = plot_df)
    

    From the comments it seems that you also want an R-squared and p values. Just be aware that these are not as straightforward to obtain or interpret for mixed models, and you should ensure you know what these mean when putting them on your plot. You can obtain them in this case as follows:

    r2 <- round(performance::r2_nakagawa(model)$R2_conditional, 3)
    pval <-  scales::pvalue(car::Anova(model, test = "F")$`Pr(>F)`, add_p = TRUE)
    

    All that remains for us to do now is plot our raw data, the regression lines, the overall fixed effect, and the equation for the fixed effect using a custom annotation extracted from our model:

    library(ggplot2)
    
    ggplot(df, aes(x, y, color = z)) +
      geom_point(size = 3, alpha = 0.5) +
      geom_line(data = plot_df, linetype = 2) +
      geom_abline(aes(intercept = fixef(model)[1], slope = fixef(model)[2],
                      linetype = "Fixed effect"), key_glyph = draw_key_path,
                  linewidth = 1) +
      scale_color_brewer(palette = "Set1") +
      labs(linetype = NULL, color = "groups (z)") +
      annotation_custom(grid::textGrob(
        bquote(bold(~y == 
                 .(round(fixef(model)[1], 3)) + .(round(fixef(model)[2], 3)) * x) ~
                 ~~~italic(paste(.(pval))~~~R^2 == .(r2))),
        x = 0.05, y = 0.9, hjust = 0, gp = grid::gpar(cex = 1.2))) +
      theme_minimal(base_size = 16) +
      ggtitle("Illustration of mixed effects model with random intercept") +
      theme(legend.position = "top",
            plot.title = element_text(face = 2, hjust = 0.5),
            panel.border = element_rect(fill = NA, linewidth = 0.2))
    

    enter image description here