Search code examples
rggplot2plotregressionnon-linear-regression

Plotting Polynomial Regression Curves in R


I have run polynomial regressions on the data that I am including from Quadratic to Septic but I am stuck trying to plot these regression curves on my scatter plot. I am asking for help creating code that will work for each polynomial order.

Time <- Mangan_2008_total$YearMonthDay
Abundance <- Mangan_2008_total$`Total Nymph Aa`

Above is the code I used to isolate the data I included in this post from the larger data set. Below is the data for reference.

(The dates are currently in the Gregorian calendar format. I plan to convert them to the Julian Days calendar at some point.)

Time
1   20080301
2   20080316
3   20080402
4   20080416
5   20080428
6   20080514
7   20080527
8   20080608
9   20080627
10  20080709
11  20080722
12  20080806
13  20080818
14  20080901
15  20080915
16  20080930
17  20081013
18  20081029
19  20081110
20  20081124

Abundance
1   0
2   0
3   26
4   337
5   122
6   232
7   190
8   381
9   148
10  201
11  69
12  55
13  35
14  15
15  6
16  1
17  0
18  1
19  0
20  0

I compiled that data into a data frame for easier manipulation:

Mangan_2008_nymph <- data.frame(Time, Abundance)

Here is the code for the scatter plot I made in ggplot:

Nymph_2008_Plot <- ggplot(Mangan_2008_nymph, aes(Time, Abundance)) + 
  geom_point(size=4, col='red') + ggtitle("2008 Amblyomma americanum Abundance") + 
  xlab("Time") + ylab("Nymph Abundance")
Nymph_2008_Plot

Here is the code I used for the regression analysis (To run higher order polynomial regressions, I simply swapped the 2 (the degree value) for the corresponding polynomial order):

Quadratic_2008_Nymph <- lm(Abundance ~ poly(Time, 2))
summary(Quadratic_2008_Nymph)

This is where I get stuck. How do I graph the polynomial regression curves onto my plot? If there is a way to do this with the ggplot format that would be preferred. If plotting these polynomial curves onto a ggplot plot wont work then I will switch my formatting.

Thanks in advance and comment if I need to clarify/provide more information.


Solution

  • The first thing I would do here is to convert the numbers you are treating as dates into actual dates. If you don't do this, lm will give the wrong result; as an example, rows 1 and 2 of your data frame represent data 15 days apart (20080316 - 20080301 = 15), but then rows 2 and 3 are 17 days apart, yet the regression will see them as being 86 days apart (20080402 - 20080316 = 86). This will cause lm to produce nonsensical results.

    It is good practice to get into the habit of changing numbers or character strings that represent date and time data into actual dates and times as early in your analysis as you can. It makes the rest of your code much easier. In your case, that would just be:

    Mangan_2008_nymph$Time <- as.Date(as.character(Mangan_2008_nymph$Time), "%Y%m%d")
    

    For the plot itself, you can add the polynomial regression lines using geom_smooth. You specify the method lm, and the formula (in terms of x and y, not in terms of the variable names). It is also good idea to map the line to a color aesthetic so that it appears in a legend. Do this once for each polynomial order you wish to add.

    library(ggplot2)
    
    ggplot(Mangan_2008_nymph, aes(Time, Abundance)) + 
      geom_point(size = 4, col = 'red') + 
      geom_smooth(method = "lm", formula = y ~ poly(x, 2), aes(color = "2"), se = F) +
      geom_smooth(method = "lm", formula = y ~ poly(x, 3), aes(color = "3"), se = F) +
      geom_smooth(method = "lm", formula = y ~ poly(x, 4), aes(color = "4"), se = F) +
      geom_smooth(method = "lm", formula = y ~ poly(x, 5), aes(color = "5"), se = F) +
      geom_smooth(method = "lm", formula = y ~ poly(x, 6), aes(color = "6"), se = F) +
      geom_smooth(method = "lm", formula = y ~ poly(x, 7), aes(color = "7"), se = F) +
      labs(x = "Time", y = "Nymph abundance", color = "Degree") +
      ggtitle("2008 Amblyomma americanum Abundance") 
    

    Personally, I think this ends up making the plot a little cluttered, and I would consider dropping a couple of polynomial levels to make this plot easier to read. You can also easily add a few stylistic tweaks:

    ggplot(Mangan_2008_nymph, aes(Time, Abundance)) + 
      geom_point(size = 2, col = 'gray50') + 
      geom_smooth(method = "lm", formula = y ~ poly(x, 3), aes(color = "3"), se = FALSE) +
      geom_smooth(method = "lm", formula = y ~ poly(x, 5), aes(color = "5"), se = FALSE) +
      geom_smooth(method = "lm", formula = y ~ poly(x, 7), aes(color = "7"), se = FALSE) +
      scale_color_brewer(palette = "Set1") +
      labs(x = "Time", y = "Nymph abundance", color = "Degree") +
      ggtitle("2008 Amblyomma americanum Abundance") +
      theme_classic() +
      theme(panel.grid.major = element_line())
    

    enter image description here


    Data used

    Mangan_2008_nymph <- structure(list(Time = c(20080301L, 20080316L, 20080402L, 
    20080416L, 20080428L, 20080514L, 20080527L, 20080608L, 20080627L, 20080709L, 
    20080722L, 20080806L, 20080818L, 20080901L, 20080915L, 20080930L, 
    20081013L, 20081029L, 20081110L, 20081124L), Abundance = c(0L, 
    0L, 26L, 337L, 122L, 232L, 190L, 381L, 148L, 201L, 69L, 55L, 
    35L, 15L, 6L, 1L, 0L, 1L, 0L, 0L)), class = "data.frame", row.names = c("1", 
    "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
    "14", "15", "16", "17", "18", "19", "20"))