Search code examples
rplottimedensity-plot

Plotting a "Popular times" graph in R


I want to create a graph that shows the busy times in a hostal I am working in. Ideally, I would be able to create two density curves (or bars), one for the regular weekdays and one for weekends and holidays. I have the check-in and check-out times of all our clients and not much more.

I am picturing getting something similar to this type of graph (although, bear in mind people here stay overnight).

Popular times - Google

Any ideas on how to tackle this?


Solution

  • First, let's generate some check-in/check-out data:

    library(tidyverse)
    library(lubridate)
    
    nclients <- 40
    startinterval <- '2021/11/04'
    endinterval <- '2021/11/07'
    
    
    set.seed(1236)
    data <- tibble(client = 1:nclients,
                   checkin = ymd_hms(sample(seq(as.POSIXct(startinterval),
                                                as.POSIXct(endinterval),
                                                by="sec"),
                                            nclients,
                                            replace = TRUE))
    )
    
    
    data <- data %>%
      mutate(inte = floor(sample(x = 1800:28800, size = nrow(data), replace = TRUE)),
             checkout = checkin + inte) %>%
      select(-inte)
    
    head(data)
    
    # # A tibble: 6 x 3
    # client checkin             checkout
    # <int> <dttm>              <dttm>
    # 1      1 2021-11-06 23:21:22 2021-11-07 06:14:03
    # 2      2 2021-11-04 19:22:20 2021-11-04 22:46:54
    # 3      3 2021-11-06 21:44:56 2021-11-07 04:22:11
    # 4      4 2021-11-05 04:32:33 2021-11-05 09:32:05
    # 5      5 2021-11-05 13:27:55 2021-11-05 15:34:22
    # 6      6 2021-11-04 15:31:23 2021-11-04 22:41:26
    

    We then need a function to convert that data into one that specifies the hours the client were registered (i.e. between check-in and check-out). Modifying part of Bas's answer here, we get:

    whathours <- function(start_time, end_time) {
      time_interval <- interval(start_time, end_time)
      
      start_hour <- floor_date(start_time, unit = "hour")
      end_hour <- ceiling_date(end_time, unit = "hour")
      diff_hours <- as.double(difftime(end_hour, start_hour, "hours"))
      
      hours <- start_hour + hours(0:diff_hours)
      hour_intervals <- int_diff(hours)
      
      hours <- hours[1:(length(hours)-1)]
      tibble(Day = date(hours),
             HourOfDay = hour(hours))
    
    }
    

    Mapping that function, we then generate a dataset that groups how many clients were registered in the hostal for every hour, grouping for type of day as requested:

    data2 <- data %>%  
      mutate(start_time = as_datetime(checkin),
             end_time = as_datetime(checkout)) %>% 
      as_tibble() %>% 
      mutate(infoperhour = purrr::map2(start_time, end_time, whathours)) %>% 
      unnest(infoperhour) %>% 
      group_by(Day, HourOfDay) %>% 
      summarise(day = wday(Day),
                typeofday = ifelse(day %in% c(1:5), "weekday", "weekend")) %>%
      group_by(typeofday) %>% count(HourOfDay, sort = TRUE) %>%
      ungroup() %>%
      arrange(typeofday, HourOfDay)
    head(data2)
    
    # # A tibble: 6 x 3
    # typeofday HourOfDay     n
    # <chr>         <int> <int>
    # 1 weekday           0     3
    # 2 weekday           1     3
    # 3 weekday           2     5
    # 4 weekday           3     5
    # 5 weekday           4     5
    # 6 weekday           5     4
    

    Finally, we plot the data:

    data2 %>%
      ggplot(.,aes(x = HourOfDay, color = typeofday))+
      geom_density()+
      scale_x_continuous(limits = c(0, 23), breaks = seq(0, 23, by = 1))+
      theme_classic()
    

    Final plot