I want to create a graph that shows the busy times in a hostal I am working in. Ideally, I would be able to create two density curves (or bars), one for the regular weekdays and one for weekends and holidays. I have the check-in and check-out times of all our clients and not much more.
I am picturing getting something similar to this type of graph (although, bear in mind people here stay overnight).
Any ideas on how to tackle this?
First, let's generate some check-in/check-out data:
library(tidyverse)
library(lubridate)
nclients <- 40
startinterval <- '2021/11/04'
endinterval <- '2021/11/07'
set.seed(1236)
data <- tibble(client = 1:nclients,
checkin = ymd_hms(sample(seq(as.POSIXct(startinterval),
as.POSIXct(endinterval),
by="sec"),
nclients,
replace = TRUE))
)
data <- data %>%
mutate(inte = floor(sample(x = 1800:28800, size = nrow(data), replace = TRUE)),
checkout = checkin + inte) %>%
select(-inte)
head(data)
# # A tibble: 6 x 3
# client checkin checkout
# <int> <dttm> <dttm>
# 1 1 2021-11-06 23:21:22 2021-11-07 06:14:03
# 2 2 2021-11-04 19:22:20 2021-11-04 22:46:54
# 3 3 2021-11-06 21:44:56 2021-11-07 04:22:11
# 4 4 2021-11-05 04:32:33 2021-11-05 09:32:05
# 5 5 2021-11-05 13:27:55 2021-11-05 15:34:22
# 6 6 2021-11-04 15:31:23 2021-11-04 22:41:26
We then need a function to convert that data into one that specifies the hours the client were registered (i.e. between check-in and check-out). Modifying part of Bas's answer here, we get:
whathours <- function(start_time, end_time) {
time_interval <- interval(start_time, end_time)
start_hour <- floor_date(start_time, unit = "hour")
end_hour <- ceiling_date(end_time, unit = "hour")
diff_hours <- as.double(difftime(end_hour, start_hour, "hours"))
hours <- start_hour + hours(0:diff_hours)
hour_intervals <- int_diff(hours)
hours <- hours[1:(length(hours)-1)]
tibble(Day = date(hours),
HourOfDay = hour(hours))
}
Mapping that function, we then generate a dataset that groups how many clients were registered in the hostal for every hour, grouping for type of day as requested:
data2 <- data %>%
mutate(start_time = as_datetime(checkin),
end_time = as_datetime(checkout)) %>%
as_tibble() %>%
mutate(infoperhour = purrr::map2(start_time, end_time, whathours)) %>%
unnest(infoperhour) %>%
group_by(Day, HourOfDay) %>%
summarise(day = wday(Day),
typeofday = ifelse(day %in% c(1:5), "weekday", "weekend")) %>%
group_by(typeofday) %>% count(HourOfDay, sort = TRUE) %>%
ungroup() %>%
arrange(typeofday, HourOfDay)
head(data2)
# # A tibble: 6 x 3
# typeofday HourOfDay n
# <chr> <int> <int>
# 1 weekday 0 3
# 2 weekday 1 3
# 3 weekday 2 5
# 4 weekday 3 5
# 5 weekday 4 5
# 6 weekday 5 4
Finally, we plot the data:
data2 %>%
ggplot(.,aes(x = HourOfDay, color = typeofday))+
geom_density()+
scale_x_continuous(limits = c(0, 23), breaks = seq(0, 23, by = 1))+
theme_classic()