I am using a combination of ggplot2 and purrr in R Studio to loop through a dataframe and generate density plots. Here is a mock dataframe, similar to the structure of what I am working with:-
#load relevant libraries
library(ggplot2)
library(dplyr)
library(purrr)
library(gridExtra)
#mock dataframe
set.seed(123)
Duration<-floor(rnorm(1000, mean=200, sd=50))
DateTime<-seq.POSIXt(from = as.POSIXct("2020-08-01 01:00:00", tz = Sys.timezone()), length.out = 1000, by = "hours")
df<-cbind(Duration,DateTime)
df<-as.data.frame(df)
df$Duration<-as.integer(df$Duration)
df$DateTime<-seq.POSIXt(from = as.POSIXct("2020-08-01 01:00:00", tz = Sys.timezone()),
length.out = 1000, by = "hours")#re-doing this to stop the annoying change back to numeric
df$WeekNumber<-isoweek(df$DateTime)
#create a "period" column
setDT(df)[WeekNumber>=31 & WeekNumber <=32, Period:="Period 1"]
df[WeekNumber>=33 & WeekNumber <=35, Period:="Period 2"]
df[WeekNumber>=36 & WeekNumber <=37, Period:="Period 3"]
df$Period<-factor(df$Period, levels = c("Period 1", "Period 2", "Period 3"))
And here is the code which uses purrr to loop through the dataframe to generate a density plot for each week:-
densplot<-df %>%
group_by(WeekNumber) %>%
summarise() %>%
pull() %>%
# run map() instead of for()
map(~{
df %>%
# filter for each value
filter(WeekNumber == .x) %>%
# run unique density plot
ggplot(aes(group=WeekNumber)) +
geom_density(aes(Duration))+
ggtitle(paste0("Week ",.x," duration"), subtitle = "Log10")+
scale_x_log10()
})
#call grid.arrange to create a faceted version of the plot
do.call(grid.arrange,densplot)
Which gives this:-
What I am trying to do is colour the density lines by "Period" for aid of interpretation. This would be easy enough using ggplot2 on it's own but I would like to use it in my purrr pipeline. However, if I specify ggplot(aes(group=WeekNumber, colour=Period))
or geom_density(aes(Duration))
, I get this:-
Plus, a legend for each individual plot, which does look untidy. I would like to be able to colour each individual Period and a single legend displaying the colour of all three Periods (perhaps placed on the right hand side). Is there a way to do this?
It would be better to use facet_wrap()
in order to avoid issues with colors. Here the code for your options:
library(ggplot2)
library(dplyr)
#Code
df %>% mutate(WeekNumber=paste0("Week ",WeekNumber," duration")) %>%
ggplot(aes(x=Duration,group=WeekNumber,color=Period)) +
geom_density()+
scale_x_log10()+
facet_wrap(.~WeekNumber,scales='free')
Output:
Update: If you want to iterate, you can adapt a list strategy by splitting your df
by period. Then using a function for the plot and patchwork
package you can get the expected plot. As additional remark, if you wish different colors you can hack the pipeline by defining the colors in your dataframe before splitting. I did in a practical way but you could use a color palette if more periods are present. Here the code:
library(patchwork)
#Add Colors to df
dfcol <- data.frame(Period=unique(df$Period),color=c('blue','red','green'),stringsAsFactors = F)
#Add to df
df$Colors <- dfcol[match(df$Period,dfcol$Period),"color"]
#Approach 2
#Create a list
List <- split(df,df$WeekNumber)
#Plot function
myplot <- function(x)
{
#Extract color
mycol <- unique(x$Colors)
#Plots
p1 <- ggplot(x,aes(x=Duration,group=WeekNumber,color=Period)) +
geom_density()+
scale_x_log10()+
scale_color_manual(values = mycol)+
ggtitle(paste0("Week ",unique(x$WeekNumber)," duration"), subtitle = "Log10")+
theme(legend.title = element_blank())
return(p1)
}
#Apply
L1 <- lapply(List,myplot)
#Wrap plots
combined <- wrap_plots(L1,ncol = 3)
combined + plot_layout(guides = "collect")
Output: