I have run polynomial regressions on the data that I am including from Quadratic to Septic but I am stuck trying to plot these regression curves on my scatter plot. I am asking for help creating code that will work for each polynomial order.
Time <- Mangan_2008_total$YearMonthDay
Abundance <- Mangan_2008_total$`Total Nymph Aa`
Above is the code I used to isolate the data I included in this post from the larger data set. Below is the data for reference.
(The dates are currently in the Gregorian calendar format. I plan to convert them to the Julian Days calendar at some point.)
Time
1 20080301
2 20080316
3 20080402
4 20080416
5 20080428
6 20080514
7 20080527
8 20080608
9 20080627
10 20080709
11 20080722
12 20080806
13 20080818
14 20080901
15 20080915
16 20080930
17 20081013
18 20081029
19 20081110
20 20081124
Abundance
1 0
2 0
3 26
4 337
5 122
6 232
7 190
8 381
9 148
10 201
11 69
12 55
13 35
14 15
15 6
16 1
17 0
18 1
19 0
20 0
I compiled that data into a data frame for easier manipulation:
Mangan_2008_nymph <- data.frame(Time, Abundance)
Here is the code for the scatter plot I made in ggplot:
Nymph_2008_Plot <- ggplot(Mangan_2008_nymph, aes(Time, Abundance)) +
geom_point(size=4, col='red') + ggtitle("2008 Amblyomma americanum Abundance") +
xlab("Time") + ylab("Nymph Abundance")
Nymph_2008_Plot
Here is the code I used for the regression analysis (To run higher order polynomial regressions, I simply swapped the 2 (the degree value) for the corresponding polynomial order):
Quadratic_2008_Nymph <- lm(Abundance ~ poly(Time, 2))
summary(Quadratic_2008_Nymph)
This is where I get stuck. How do I graph the polynomial regression curves onto my plot? If there is a way to do this with the ggplot format that would be preferred. If plotting these polynomial curves onto a ggplot plot wont work then I will switch my formatting.
Thanks in advance and comment if I need to clarify/provide more information.
The first thing I would do here is to convert the numbers you are treating as dates into actual dates. If you don't do this, lm
will give the wrong result; as an example, rows 1 and 2 of your data frame represent data 15 days apart (20080316 - 20080301 = 15), but then rows 2 and 3 are 17 days apart, yet the regression will see them as being 86 days apart (20080402 - 20080316 = 86). This will cause lm
to produce nonsensical results.
It is good practice to get into the habit of changing numbers or character strings that represent date and time data into actual dates and times as early in your analysis as you can. It makes the rest of your code much easier. In your case, that would just be:
Mangan_2008_nymph$Time <- as.Date(as.character(Mangan_2008_nymph$Time), "%Y%m%d")
For the plot itself, you can add the polynomial regression lines using geom_smooth
. You specify the method lm
, and the formula (in terms of x and y, not in terms of the variable names). It is also good idea to map the line to a color aesthetic so that it appears in a legend. Do this once for each polynomial order you wish to add.
library(ggplot2)
ggplot(Mangan_2008_nymph, aes(Time, Abundance)) +
geom_point(size = 4, col = 'red') +
geom_smooth(method = "lm", formula = y ~ poly(x, 2), aes(color = "2"), se = F) +
geom_smooth(method = "lm", formula = y ~ poly(x, 3), aes(color = "3"), se = F) +
geom_smooth(method = "lm", formula = y ~ poly(x, 4), aes(color = "4"), se = F) +
geom_smooth(method = "lm", formula = y ~ poly(x, 5), aes(color = "5"), se = F) +
geom_smooth(method = "lm", formula = y ~ poly(x, 6), aes(color = "6"), se = F) +
geom_smooth(method = "lm", formula = y ~ poly(x, 7), aes(color = "7"), se = F) +
labs(x = "Time", y = "Nymph abundance", color = "Degree") +
ggtitle("2008 Amblyomma americanum Abundance")
Personally, I think this ends up making the plot a little cluttered, and I would consider dropping a couple of polynomial levels to make this plot easier to read. You can also easily add a few stylistic tweaks:
ggplot(Mangan_2008_nymph, aes(Time, Abundance)) +
geom_point(size = 2, col = 'gray50') +
geom_smooth(method = "lm", formula = y ~ poly(x, 3), aes(color = "3"), se = FALSE) +
geom_smooth(method = "lm", formula = y ~ poly(x, 5), aes(color = "5"), se = FALSE) +
geom_smooth(method = "lm", formula = y ~ poly(x, 7), aes(color = "7"), se = FALSE) +
scale_color_brewer(palette = "Set1") +
labs(x = "Time", y = "Nymph abundance", color = "Degree") +
ggtitle("2008 Amblyomma americanum Abundance") +
theme_classic() +
theme(panel.grid.major = element_line())
Data used
Mangan_2008_nymph <- structure(list(Time = c(20080301L, 20080316L, 20080402L,
20080416L, 20080428L, 20080514L, 20080527L, 20080608L, 20080627L, 20080709L,
20080722L, 20080806L, 20080818L, 20080901L, 20080915L, 20080930L,
20081013L, 20081029L, 20081110L, 20081124L), Abundance = c(0L,
0L, 26L, 337L, 122L, 232L, 190L, 381L, 148L, 201L, 69L, 55L,
35L, 15L, 6L, 1L, 0L, 1L, 0L, 0L)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18", "19", "20"))