Search code examples
rggplot2dplyrpurrrdensity-plot

Creating density plots using ggplot2 and purrr; colour of density line based on group


I am using a combination of ggplot2 and purrr in R Studio to loop through a dataframe and generate density plots. Here is a mock dataframe, similar to the structure of what I am working with:-

#load relevant libraries

library(ggplot2)
library(dplyr)
library(purrr)
library(gridExtra)

#mock dataframe
set.seed(123)
Duration<-floor(rnorm(1000, mean=200, sd=50))
DateTime<-seq.POSIXt(from = as.POSIXct("2020-08-01 01:00:00", tz = Sys.timezone()), length.out = 1000, by = "hours")
df<-cbind(Duration,DateTime)
df<-as.data.frame(df)
df$Duration<-as.integer(df$Duration)
df$DateTime<-seq.POSIXt(from = as.POSIXct("2020-08-01 01:00:00", tz = Sys.timezone()), 
                        length.out = 1000, by = "hours")#re-doing this to stop the annoying change back to numeric
df$WeekNumber<-isoweek(df$DateTime)
#create a "period" column
setDT(df)[WeekNumber>=31 & WeekNumber <=32, Period:="Period 1"]
df[WeekNumber>=33 & WeekNumber <=35, Period:="Period 2"]
df[WeekNumber>=36 & WeekNumber <=37, Period:="Period 3"]
df$Period<-factor(df$Period, levels = c("Period 1", "Period 2", "Period 3"))

And here is the code which uses purrr to loop through the dataframe to generate a density plot for each week:-

densplot<-df %>%
  group_by(WeekNumber) %>%
  summarise() %>%
  pull() %>% 
  # run map() instead of for()
  map(~{
    df %>%
      # filter for each value 
      filter(WeekNumber == .x) %>%
            # run unique density plot
      ggplot(aes(group=WeekNumber)) +
      geom_density(aes(Duration))+
      ggtitle(paste0("Week ",.x," duration"), subtitle = "Log10")+
      scale_x_log10()
  })

#call grid.arrange to create a faceted version of the plot
do.call(grid.arrange,densplot)

Which gives this:-

Result of running do.call(grid.arrange,densplot)

What I am trying to do is colour the density lines by "Period" for aid of interpretation. This would be easy enough using ggplot2 on it's own but I would like to use it in my purrr pipeline. However, if I specify ggplot(aes(group=WeekNumber, colour=Period)) or geom_density(aes(Duration)), I get this:- Wrong outcome

Plus, a legend for each individual plot, which does look untidy. I would like to be able to colour each individual Period and a single legend displaying the colour of all three Periods (perhaps placed on the right hand side). Is there a way to do this?


Solution

  • It would be better to use facet_wrap() in order to avoid issues with colors. Here the code for your options:

    library(ggplot2)
    library(dplyr)
    #Code
    df %>% mutate(WeekNumber=paste0("Week ",WeekNumber," duration")) %>%
      ggplot(aes(x=Duration,group=WeekNumber,color=Period)) +
      geom_density()+
      scale_x_log10()+
      facet_wrap(.~WeekNumber,scales='free')
    

    Output:

    enter image description here

    Update: If you want to iterate, you can adapt a list strategy by splitting your df by period. Then using a function for the plot and patchwork package you can get the expected plot. As additional remark, if you wish different colors you can hack the pipeline by defining the colors in your dataframe before splitting. I did in a practical way but you could use a color palette if more periods are present. Here the code:

    library(patchwork)
    #Add Colors to df
    dfcol <- data.frame(Period=unique(df$Period),color=c('blue','red','green'),stringsAsFactors = F)
    #Add to df
    df$Colors <- dfcol[match(df$Period,dfcol$Period),"color"]
    #Approach 2
    #Create a list
    List <- split(df,df$WeekNumber)
    #Plot function
    myplot <- function(x)
    {
      #Extract color
      mycol <- unique(x$Colors)
      #Plots
      p1 <- ggplot(x,aes(x=Duration,group=WeekNumber,color=Period)) +
        geom_density()+
        scale_x_log10()+
        scale_color_manual(values = mycol)+
        ggtitle(paste0("Week ",unique(x$WeekNumber)," duration"), subtitle = "Log10")+
        theme(legend.title = element_blank())
      return(p1)
    }
    #Apply
    L1 <- lapply(List,myplot)
    #Wrap plots
    combined <- wrap_plots(L1,ncol = 3)
    combined + plot_layout(guides = "collect")
    

    Output:

    enter image description here