Search code examples
rggplot2regressionpolynomials

'How to plot a polynomial line and equation using ggplot and then combine them into one plot'


I am trying to compare and contrast on one plot the difference between four relationships with a similar x-axis.

I can seem to plot the regression line but have no idea how to plot the equation and/or combine all four plots onto one.

Here is the basic foundation of my code: Sorry if it is pretty basic or clumsy, I am just beginning.

library(ggplot2)
library(cowplot)

p1 <- ggplot(NganokeData, aes(x=Depth,y=LCU1)) + geom_point() + 
  labs(x ='Depths (cm)', y ='Density (Hu)', title = 'Density Regression of Lake Nganoke Core 1') +
  ylim(1,2)

p2 <- ggplot(NganokeData, aes(x=Depth,y=LCU2)) + geom_point() + 
  labs(x ='Depths (cm)', y ='Density (Hu)', title = 'Density Regression of Lake Nganoke Core 2') +
  ylim(1,2)

p3 <- ggplot(NganokeData, aes(x=Depth,y=LCU3)) + geom_point() + 
  labs(x ='Depths (cm)', y ='Density (Hu)', title = 'Density Regression of Lake Nganoke Core 3') +
  ylim(1,2)

p4 <- ggplot(NganokeData, aes(x=Depth,y=LCU4)) + geom_point() + 
  labs(x ='Depths (cm)', y ='Density (Hu)', title = 'Density Regression of Lake Nganoke Core 4') +
  ylim(1,2)

p3 + stat_smooth(method = "lm", formula = y ~ poly(x, 3), size = 1) #Adds polynomial regression

Picture of my code


Solution

  • It looks like you have a variable of interest (LCU1, LCU2, LCU3, LCU4) in the column names. You can use gather from the tidyr package to reshape the data frame:

    library(tidyr)
    long_data <- gather(NganokeData, key = "core", value = "density",
                        LCU1, LCU2, LCU3, LCU4)
    

    And then use facet_grid from the ggplot2 package to divide your plot into the four facets you are looking for.

    p <- ggplot(long_data, aes(x=Depth,y=density)) + 
              geom_point() + 
              labs(x ='Depths (cm)', y ='Density (Hu)', 
                   title = 'Density Regression of Lake Nganoke Cores') +
         ylim(1,2) +
         facet_grid(rows = vars(core)) #can also use cols instead
    
    p + stat_smooth(method = "lm", formula = y ~ poly(x, 3), size = 1)
    

    Your code is great. But as a beginner, I highly recommend taking a few minutes to read and learn to use the tidyr package, as ggplot2 is built on the concepts of tidy data, and you'll find it much easier to make your visualizations if you can manipulate the data frame into the format you need before trying to plot it.

    https://tidyr.tidyverse.org/index.html

    EDIT:

    To add an annotation detailing the regression equation, I found code taken from this blog post by Jodie Burchell:

    http://t-redactyl.io/blog/2016/05/creating-plots-in-r-using-ggplot2-part-11-linear-regression-plots.html

    First, though, it's not going to be possible to glean a displayable regression equation using the poly function in your formula as you have it. The advantage of orthogonal polynomials is that they avoid collinearity, but the drawback is that you no longer have an easily interpretable regression equation with x and x squared and x cubed as regressors.

    So we'll have to change the lm fit formula to

    y ~ poly(x, 3, raw = TRUE)
    

    which will fit the raw polynomials and give you the regression equation you are looking for.

    You are going to have to alter the x and y position values to determine where on the graph to place the annotation, since I don't have your data, but here's the custom function you need:

    equation = function(x) {
      lm_coef <- list(a = round(coef(x)[1], digits = 2),
                      b = round(coef(x)[2], digits = 2),
                      c = round(coef(x)[3], digits = 2),
                      d = round(coef(x)[4], digits = 2),
                      r2 = round(summary(x)$r.squared, digits = 2));
      lm_eq <- substitute(italic(y) == a + b %.% italic(x) + c %.% italic(x)^2 + d %.% italic(x)^3*","~~italic(R)^2~"="~r2,lm_coef)
      as.character(as.expression(lm_eq));                 
    }
    

    Then just add the annotation to your plot, adjust the x and y parameters as needed, and you'll be set:

    p + 
        stat_smooth(method = "lm", formula = y ~ poly(x, 3, raw = TRUE), size = 1) +
        annotate("text", x = 1, y = 10, label = equation(fit), parse = TRUE)