Search code examples
rtimeggplot2comparisonline-plot

How to add a curved fit lines to points for multiple variable data?


I am trying to graph expected values and actual values, over time. I have some data that I'd like to get all on one graph. I am still pretty new to R, I keep getting stuck.

So far I have been able to get what I want on separate graphs, or if I get them all together, I can't seem to get it to do what I want.

I am almost there, but I'd like to have the points (points are expected values) connected with a dashed line. I tried adding a LOESS line a few different ways (one is hashed in my code), but I keep getting errors.

I am still new to R (and coding in general), but I know there has to be a way to do this besides building up the plot manually. However, every example I try will do something that I want, but I can't seem to get everything to work at once. I am starting to understand what each thing does, but sometimes I get lost in what works with what.

Error in xy.coords(x, y, xlabel, ylabel) : 'x' is a list, but does not have components 'x' and 'y'

Error: Don't know how to add RHS to a theme object

My plot: (without links connected)

My plot, without links connected

My Dataset

Year,SC_CE_5AGG,SC_ACA,TA_CE_5AGG,TA_ACA,OA_CE_5AGG,OA_ACA,CO_CE_5AGG,CO_ACA
2005,8,12,5,0,140,100,23,23
2006,,13,,0,,100,,25
2007,,13,,0,,102,,37
2008,,14,,0,,104,,36
2009,,16,,3,,104,,35
2010,10,17,6,4,179,106,29,36
2011,,20,,7,,111,,36
2012,,23,,7,,116,,33
2013,,22,,10,,118,,37
2014,,23,,12,,107,,40
2015,12,23,8,14,229,112,37,46
2016,,25,,14,,119,,56
2017,,28,,13,,120,,60
2018,,,,,,,,
2019,,,,,,,,
2020,16,,10,,292,,48,
2025,20,,20,,372,,61,

My code

setwd("C:Users/X/Documents/PROJECTS/R_RcW/Data")


install.packages("ggplot2")
install.packages("GGally")
library(ggplot2)
library(GGally)

ALL <- read.csv(file="Rcw_data.csv", header = TRUE)

#To plot multiple lines, (for a small number of variables) you can use build up the plot manually yourself
ggplot(data=ALL, aes(Year)) + 
   geom_line(aes(y = SC_ACA, colour = "Shoal Creek")) + 
   lines(scatter.smooth(aes(y = SC_CE_5AGG, colour = "Shoal Creek"))) + 
   geom_line(aes(y = TA_ACA, colour = "Talladega")) +
   lines(scatter.smooth(aes(y = TA_CE_5AGG, colour = "Talladega"))) +
   geom_line(aes(y = OA_ACA, colour = "Oakmulgee")) + 
   lines(scatter.smooth(aes(y = OA_CE_5AGG, colour = "Oakmulgee"))) + 
   geom_line(aes(y = CO_ACA, colour = "Conecuh")) +
   lines(scatter.smooth(aes(y = CO_CE_5AGG, colour = "Conecuh"))) +
  #lines(lowess(SC_CE_5AGG), col="Shoal Creek") +  # lowess line (x,y) 
  #lines(lowess(TA_CE_5AGG), col="Talladega") +  # lowess line (x,y)
  #lines(lowess(OA_CE_5AGG), col="Oakmulgee") + # lowess line (x,y)
  #lines(lowess(CO_CE_5AGG), col="Conecuh") # lowess line (x,y)

  theme_classic() +
  ggtitle("Active clusters of Red-cockaded Woodpeckers") +  
  theme(plot.title = element_text(hjust = 0.5)) +
  labs(colour="District") + 
  theme(legend.title.align=0.5) +
  theme(panel.border = element_rect(colour = "black", fill=NA, size=)) +
  scale_x_continuous(limits=c(2005, 2025), breaks=c(2005,2010,2015,2020,2025)) +
  xlab("Year") + ylab("Number of active clusters")   

Solution

  • I think you will be better off reshaping your data to long format something like:

    library(tidyverse)
    library(reshape2)
    

    data

    structure(list(Year = c(2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 
       2011L, 2012L, 2013L, 2014L, 2015L, 2016L, 2017L, 2018L, 2019L, 
       2020L, 2025L), SC_CE_5AGG = c(8L, NA, NA, NA, NA, 10L, NA, NA, 
       NA, NA, 12L, NA, NA, NA, NA, 16L, 20L), SC_ACA = c(12L, 13L, 
       13L, 14L, 16L, 17L, 20L, 23L, 22L, 23L, 23L, 25L, 28L, NA, NA, 
       NA, NA), TA_CE_5AGG = c(5L, NA, NA, NA, NA, 6L, NA, NA, NA, NA, 
       8L, NA, NA, NA, NA, 10L, 20L), TA_ACA = c(0L, 0L, 0L, 0L, 3L, 
       4L, 7L, 7L, 10L, 12L, 14L, 14L, 13L, NA, NA, NA, NA), OA_CE_5AGG = c(140L, 
       NA, NA, NA, NA, 179L, NA, NA, NA, NA, 229L, NA, NA, NA, NA, 292L, 
       372L), OA_ACA = c(100L, 100L, 102L, 104L, 104L, 106L, 111L, 116L, 
       118L, 107L, 112L, 119L, 120L, NA, NA, NA, NA), CO_CE_5AGG = c(23L, 
       NA, NA, NA, NA, 29L, NA, NA, NA, NA, 37L, NA, NA, NA, NA, 48L, 
       61L), CO_ACA = c(23L, 25L, 37L, 36L, 35L, 36L, 36L, 33L, 37L, 
       40L, 46L, 56L, 60L, NA, NA, NA, NA)), .Names = c("Year", "SC_CE_5AGG", 
       "SC_ACA", "TA_CE_5AGG", "TA_ACA", "OA_CE_5AGG", "OA_ACA", "CO_CE_5AGG", 
       "CO_ACA"), class = "data.frame", row.names = c(NA, -17L))
    
      All %>% 
          melt(id="Year") %>% 
          na.omit() %>% 
          mutate(est =factor(grepl("5AGG", variable))) %>% 
          ggplot(aes(Year, value, color=variable, lty=est)) + 
          geom_line() +
          theme_classic() +
          ggtitle("Active clusters of Red-cockaded Woodpeckers") +  
          theme(plot.title = element_text(hjust = 0.5)) +
          labs(colour="District") + 
          theme(legend.title.align=0.5) +
          theme(panel.border = element_rect(colour = "black", fill=NA, size=)) +
          scale_x_continuous(limits=c(2005, 2025), 
                             breaks=c(2005,2010,2015,2020,2025)) +
          xlab("Year") + ylab("Number of active clusters")   
    

    grepl was used to define estimated values.