Search code examples
rggplot2group-bylubridate

R: ggplot of average daily counts by month


I am trying to plot average daily trip counts by month. However, I am struggling in finding how I can only include the mean number of trips per day by month in the plot instead of the total monthly trips.

The days of the week and months have already been converted from numeric type to abbreviations and have also been ordered (type: ).

Here's what I've tried for the plot.

by_day <- df_temp %>%
  group_by(Start.Day)

ggplot(by_day, aes(x=Start.Month,
                    fill=Start.Month)) +
  geom_bar() +
  scale_fill_brewer(palette = "Paired") +
  labs(title="Number of Daily Trips by Month",
       x=" ",
       y="Number of Daily Trips")

Here's the plot I am trying to replicate:

enter image description here


Solution

  • You are almost there. Since you did not share a reproducible example, I simulate your data. You may need to adapt the variable naming and/or correct my assumptions.

    {lubridate} is a powerful package for date-time crunching. It comes handy when working with dates and binning dates for summaries, etc.

    # simulating your data
    ## a series of dates from June through October
    days <- seq(from = lubridate::ymd("2020-06-01")
                ,to  = lubridate::ymd("2020-10-30")
                ,by  = "1 day")
    ## random trips on each day
    set.seed(666)
    trips <- sample(2000:5000, length(days), replace = TRUE)
    
    # putting things together in a data frame
    df_temp <- data.frame(date = days, counts = trips) %>%
      # I assume the variable Start.Month is the monthly bin
      # let's use lubridate to "bin" the month from the date
      mutate(Start.Month = lubridate::floor_date(date, unit = "month"))
    
    # aggregate trips for each month, calculate average daily trips
    by_month <- df_temp %>%
      group_by(Start.Month) %>%            # group by the binning variable
      summarise(Avg.Trips = mean(counts))  # calculate the mean for each group
    
    ggplot( data = by_month
          , aes(x = Start.Month, y = Avg.Trips
          , fill=as.factor(Start.Month))   # to work with a discrete palette, factorise
          ) +
    # ------------ bar layer -----------------------------------------
    ## instead of geom_bar(... stat = "identity"), you can use geom_col()
    ## and define the fill colour
      geom_col() +  
      scale_fill_brewer(palette = "Paired") +
    
    # ------------ if you like provide context with annotation -------
      geom_text(aes(label = Avg.Trips %>% round(2)), vjust = 1) +
    
    # ------------ finalise plot with labels, theme, etc.
    
      labs(title="Number of Daily Trips by Month",
           x=NULL, # setting an unused lab to NULL is better than printing empty " "!
           y="Number of Daily Trips"
           ) + 
      theme_minimal() +
      theme(legend.position = "none")  # to suppress colour legend
    

    enter image description here