I am trying to graph expected values and actual values, over time. I have some data that I'd like to get all on one graph. I am still pretty new to R
, I keep getting stuck.
So far I have been able to get what I want on separate graphs, or if I get them all together, I can't seem to get it to do what I want.
I am almost there, but I'd like to have the points (points are expected values) connected with a dashed line. I tried adding a LOESS
line a few different ways (one is hashed in my code), but I keep getting errors.
I am still new to R
(and coding in general), but I know there has to be a way to do this besides building up the plot manually. However, every example I try will do something that I want, but I can't seem to get everything to work at once. I am starting to understand what each thing does, but sometimes I get lost in what works with what.
Error in xy.coords(x, y, xlabel, ylabel) : 'x' is a list, but does not have components 'x' and 'y'
Error: Don't know how to add RHS to a theme object
My plot: (without links connected)
My Dataset
Year,SC_CE_5AGG,SC_ACA,TA_CE_5AGG,TA_ACA,OA_CE_5AGG,OA_ACA,CO_CE_5AGG,CO_ACA
2005,8,12,5,0,140,100,23,23
2006,,13,,0,,100,,25
2007,,13,,0,,102,,37
2008,,14,,0,,104,,36
2009,,16,,3,,104,,35
2010,10,17,6,4,179,106,29,36
2011,,20,,7,,111,,36
2012,,23,,7,,116,,33
2013,,22,,10,,118,,37
2014,,23,,12,,107,,40
2015,12,23,8,14,229,112,37,46
2016,,25,,14,,119,,56
2017,,28,,13,,120,,60
2018,,,,,,,,
2019,,,,,,,,
2020,16,,10,,292,,48,
2025,20,,20,,372,,61,
My code
setwd("C:Users/X/Documents/PROJECTS/R_RcW/Data")
install.packages("ggplot2")
install.packages("GGally")
library(ggplot2)
library(GGally)
ALL <- read.csv(file="Rcw_data.csv", header = TRUE)
#To plot multiple lines, (for a small number of variables) you can use build up the plot manually yourself
ggplot(data=ALL, aes(Year)) +
geom_line(aes(y = SC_ACA, colour = "Shoal Creek")) +
lines(scatter.smooth(aes(y = SC_CE_5AGG, colour = "Shoal Creek"))) +
geom_line(aes(y = TA_ACA, colour = "Talladega")) +
lines(scatter.smooth(aes(y = TA_CE_5AGG, colour = "Talladega"))) +
geom_line(aes(y = OA_ACA, colour = "Oakmulgee")) +
lines(scatter.smooth(aes(y = OA_CE_5AGG, colour = "Oakmulgee"))) +
geom_line(aes(y = CO_ACA, colour = "Conecuh")) +
lines(scatter.smooth(aes(y = CO_CE_5AGG, colour = "Conecuh"))) +
#lines(lowess(SC_CE_5AGG), col="Shoal Creek") + # lowess line (x,y)
#lines(lowess(TA_CE_5AGG), col="Talladega") + # lowess line (x,y)
#lines(lowess(OA_CE_5AGG), col="Oakmulgee") + # lowess line (x,y)
#lines(lowess(CO_CE_5AGG), col="Conecuh") # lowess line (x,y)
theme_classic() +
ggtitle("Active clusters of Red-cockaded Woodpeckers") +
theme(plot.title = element_text(hjust = 0.5)) +
labs(colour="District") +
theme(legend.title.align=0.5) +
theme(panel.border = element_rect(colour = "black", fill=NA, size=)) +
scale_x_continuous(limits=c(2005, 2025), breaks=c(2005,2010,2015,2020,2025)) +
xlab("Year") + ylab("Number of active clusters")
I think you will be better off reshaping your data to long format something like:
library(tidyverse)
library(reshape2)
data
structure(list(Year = c(2005L, 2006L, 2007L, 2008L, 2009L, 2010L,
2011L, 2012L, 2013L, 2014L, 2015L, 2016L, 2017L, 2018L, 2019L,
2020L, 2025L), SC_CE_5AGG = c(8L, NA, NA, NA, NA, 10L, NA, NA,
NA, NA, 12L, NA, NA, NA, NA, 16L, 20L), SC_ACA = c(12L, 13L,
13L, 14L, 16L, 17L, 20L, 23L, 22L, 23L, 23L, 25L, 28L, NA, NA,
NA, NA), TA_CE_5AGG = c(5L, NA, NA, NA, NA, 6L, NA, NA, NA, NA,
8L, NA, NA, NA, NA, 10L, 20L), TA_ACA = c(0L, 0L, 0L, 0L, 3L,
4L, 7L, 7L, 10L, 12L, 14L, 14L, 13L, NA, NA, NA, NA), OA_CE_5AGG = c(140L,
NA, NA, NA, NA, 179L, NA, NA, NA, NA, 229L, NA, NA, NA, NA, 292L,
372L), OA_ACA = c(100L, 100L, 102L, 104L, 104L, 106L, 111L, 116L,
118L, 107L, 112L, 119L, 120L, NA, NA, NA, NA), CO_CE_5AGG = c(23L,
NA, NA, NA, NA, 29L, NA, NA, NA, NA, 37L, NA, NA, NA, NA, 48L,
61L), CO_ACA = c(23L, 25L, 37L, 36L, 35L, 36L, 36L, 33L, 37L,
40L, 46L, 56L, 60L, NA, NA, NA, NA)), .Names = c("Year", "SC_CE_5AGG",
"SC_ACA", "TA_CE_5AGG", "TA_ACA", "OA_CE_5AGG", "OA_ACA", "CO_CE_5AGG",
"CO_ACA"), class = "data.frame", row.names = c(NA, -17L))
All %>%
melt(id="Year") %>%
na.omit() %>%
mutate(est =factor(grepl("5AGG", variable))) %>%
ggplot(aes(Year, value, color=variable, lty=est)) +
geom_line() +
theme_classic() +
ggtitle("Active clusters of Red-cockaded Woodpeckers") +
theme(plot.title = element_text(hjust = 0.5)) +
labs(colour="District") +
theme(legend.title.align=0.5) +
theme(panel.border = element_rect(colour = "black", fill=NA, size=)) +
scale_x_continuous(limits=c(2005, 2025),
breaks=c(2005,2010,2015,2020,2025)) +
xlab("Year") + ylab("Number of active clusters")
grepl was used to define estimated values.