Search code examples
rggplot2splitsapplymapply

how do i combine multiple data sources in ggplot using split and sapply?


this question is linked to a previous one answered by @Rui Barradas and @Duck, but i need more help. Previous link here: how do i vectorise (automate) plot creation in R

Basically, I need to combine 3 datasets into one plot with a secondary y axis. All datasets need to be split by SITENAME and will facet wrap by Sampling.Year. I am using split and sapply. Being facet wrap the plots look something like this:

enter image description here

However, i'm now trying to add the two other data sources into the plots, to look something like this: enter image description here

But i am struggling to add the two other data sources and get them to split by SITENAME. Her is my code so far...

Record plot format as a function to be applied to a split list df (ideally 'df' would be added as geom_line with a secondary y axis, and 'FF_start_dates' will be added as a vertical dashed line):

SITENAME_plot <- function(AllDates_TPAF){
  ggplot(AllDates_TPAF, aes(DATE, Daily.Ave.PAF)) +
    geom_point(aes(colour = Risk), size = 3) +
     scale_colour_manual(values=c("Very Low" = "dark green","Low" = "light green", 
                                 "Moderate" = "yellow", "High" = "orange", "Very High" = "red"), drop = FALSE) +
     labs(x = "Month", y = "Total PAF (% affected)") +
            scale_x_date(breaks = "1 month", labels = scales::date_format("%B")) +
        facet_wrap(~Sampling.Year, ncol = 1, scales = "free")+
    scale_y_continuous(limits = c(0, 100), sec.axis = sec_axis(~., name = "Water level (m)")) +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
    theme(legend.text=element_text(size=15)) +
    theme(axis.text=element_text(size=15),
          axis.title=element_text(size=15,face="bold")) +
    guides(color = guide_legend(reverse = TRUE))+
    theme_bw() +
    ggtitle(unique(AllDates_TPAF$SITENAME))
}

plot write function:

SITENAME_plot_write <- function(name, g, dir = "N:/abc/"){
  flname <- file.path(dir, name)
  flname <- paste0(flname, ".jpg")
  png(filename = flname, width = 1500, height = 1000)
  print(g)
  dev.off()
  flname
}

Apply function to list split by SITENAME:

sp1 <- split(AllDates_TPAF, AllDates_TPAF$SITENAME)
gg_list <- sapply(sp1, SITENAME_plot, simplify = FALSE)
mapply(SITENAME_plot_write, names(gg_list), gg_list, MoreArgs = list(dir = getwd()))
dev.off()

I have uploaded samples of all 3 datasets here: Sample Data

Apologies for not using gsub but there was too much data and I couldn't get it to work properly

thanks in advance for any help you can give, even if it is just to point me towards a web tutorial of some kind.


Solution

  • You can try next code. I used the data you shared. Just be careful with names of all datasets. Ideally, the key columns as DATE and Sampling.Year should be present in all dataframes before making the split. Also some variables as Risk was absent so I added an example var with same name. Here the code, I added a function for the plot you want:

    library(tidyverse)
    library(readxl)
    #Data
    df1 <- read_excel('Sample data.xlsx',1)
    #Create var
    df1$Risk <- c(rep(c("Very Low","Low","Moderate","High","Very High"),67),"Very High")
    #Other data
    df2 <- read_excel('Sample data.xlsx',2)
    df3 <- read_excel('Sample data.xlsx',3)
    #Split 1
    L1 <- split(df1,df1$SITENAME)
    L2 <- split(df2,df2$SITENAME)
    L3 <- split(df3,df3$`Site Name`)
    #Function to create plots
    myplot <- function(x,y,z)
    {
      #Merge x and y
      #Check for duplicates and avoid column
      y <- y[!duplicated(paste(y$DATE,y$Sampling.Year)),]
      y$SITENAME <- NULL
      xy <- merge(x,y,by.x = c('Sampling.Year','DATE'),by.y = c('Sampling.Year','DATE'),all.x=T)
      #Format to dates
      xy$DATE <- as.Date(xy$DATE)
      #Scale factor
      scaleFactor <- max(xy$Daily.Ave.PAF) / max(xy$Height)
      #Rename for consistency in names
      names(z)[4] <- 'DATE'
      #Format date
      z$DATE <- as.Date(z$DATE)
      #Plot
      #Plot
      G <- ggplot(xy, aes(DATE, Daily.Ave.PAF)) +
        geom_point(aes(colour = Risk), size = 3) +
        scale_colour_manual(values=c("Very Low" = "dark green","Low" = "light green", 
                                     "Moderate" = "yellow", "High" = "orange", "Very High" = "red"), drop = FALSE) +
        scale_x_date(breaks = "1 month", labels = scales::date_format("%b %Y")) +
        geom_line(aes(x=DATE,y=Height*scaleFactor))+
        scale_y_continuous(name="Total PAF (% affected)", sec.axis=sec_axis(~./scaleFactor, name="Water level (m)"))+
        labs(x = "Month") +
        geom_vline(data = z,aes(xintercept = DATE),linetype="dashed")+
        facet_wrap(~Sampling.Year, ncol = 1, scales = "free")+
        theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
        theme(legend.text=element_text(size=15)) +
        theme(axis.text=element_text(size=15),
              axis.title=element_text(size=15,face="bold")) +
        guides(color = guide_legend(reverse = TRUE))+
        theme_bw() +
        ggtitle(unique(xy$SITENAME))
      return(G)
    }
    #Create a list of plots
    Lplots <- mapply(FUN = myplot,x=L1,y=L2,z=L3,SIMPLIFY = FALSE)
    #Now format names
    vnames <- paste0(names(Lplots),'.png')
    mapply(ggsave, Lplots,filename = vnames,width = 30,units = 'cm')
    

    You will end up with plots like these saved in your dir:

    enter image description here

    enter image description here

    Some dashed lines do not appear in plots because they were not present in the data you provided.