Search code examples
ggplot2plot

plotting daily time series data


I have a time series data (date column and a value column). I am trying for a daily distribution plot.

In the below image is the weekly distribution plot that plots the values of the days of the week. Similarly I am trying to plot a daily distribution plot where x axis would be months, y axis is the value and the plot has 10 lines where each line gives you the date 1, date 2 , date 3 and so on until date 10 (since 30 days in one subplot will be clumsy so i wanted to divide the plots into 3 , 1-10, 11-20 and 21-31)

Weekly distribution example

Code for weekly distribution for reference:

#dummy data
start_date <- as.Date("2020-01-01")
end_date <- as.Date("2021-12-31")
date_seq <- seq(from = start_date, to = end_date, by = "day")
set.seed(123)
value <- round(runif(length(date_seq), min = 10000, max = 100000000), 0)
df <- data.frame(date = date_seq, value = value)

df$week_number <- as.numeric(format(as.Date(df$date), "%U")) + 1
df$weekday <- weekdays(as.Date(df$date))
df$year <- as.numeric(format(as.Date(df$date), "%Y"))
years <- unique(df$year)

# Create a list of ggplots, one for each year
plots <- lapply(years, function(y) {
  year_df <- df[df$year == y, ]
  ggplot(year_df, aes(x = week_number, y = value, color = weekday)) +
    geom_line() +
    scale_color_discrete(limits = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")) +
    ggtitle(paste("Weekday Distribution", y)) +
    xlab("Week number") +
    ylab("Value") +
    theme(legend.key.size = unit(0.4, "cm")) +
    theme(plot.title = element_text(hjust = 0.5, vjust = 1.5))
library(cowplot)
plot_grid(plotlist = plots, ncol = 1)

So at the end, there will be three plots(1 to 10 dates, 11 to 20 dates and 21 to 31 dates) and each plot would contain 2 subplots (as the dates ranges from 2020 to 2021). Can anyone help me with this?


Solution

  • Below how I would do this. The lubridate package is your friend. For the grouping, use cuts.

    The result is a (in my opinion) pretty useless clutter of lines. But this is not the only reason why I do not endorse this visualisation. I feel this somehow defeats the point of a time series... one point is to visualise the auto-correlation of your data. Artificially separating out only specific days from each month impacts drastically on this particular advantage (and maybe: reason) of using a time series. You're not only losing information, but also making your own analytical life much more complicated.

    library(ggplot2)
    library(dplyr)
    library(lubridate)
    
    df %>%
      mutate(day = mday(date), 
             day_group = cut(day, c(1,11,21, 31), incl = T),
             month = month(date, label = T, abbr = T))  %>%
    ggplot(aes(x = month, y = value, color = day, group=interaction(day, day_group))) +
      geom_line() +
      theme(legend.key.size = unit(0.4, "cm"),
            plot.title = element_text(hjust = 0.5, vjust = 1.5), 
            axis.text.x = element_text(angle = 90)) +
      facet_wrap(year~day_group)
    

    I feel you want to show how the "typical" 1st day compares with the 2nd, etc. For this, an aggregate visualisation might be more useful. (Still not a good idea, but at least you get a better idea of your data). This you can do with "stat_summary" which you pass to geom_smooth which has a geometry that combines geom_line and geom_ribbon.

    df %>%
      mutate(day = mday(date), 
             month = month(date, label = T, abbr = T))  %>%
      ggplot(aes(x = day, y = value)) +
      geom_smooth(stat= "summary", alpha = .5, color = "black") +
      facet_grid(~year)
    #> No summary function supplied, defaulting to `mean_se()`
    #> No summary function supplied, defaulting to `mean_se()`