As you can see I have a dataset showing entry and exit times of an area. I would like to determine the content of this area on an hourly basis, preferably in the same dataset with a new column somehow, or alternatively in a new data set. Below is the data.
ID Arrival (sec) Exit (sec)
7656 152845.085 155747.92
7657 152845.085 155784.91
7658 147617 150035.747
7659 152852.811 158216.428
7660 147617 150076.503
The data shows the seconds from 1st of january 00:00 year 2000.
The purpose of this is to create a bar chart in Tableau showing this content per hour over the span of a week. R and SQL are thus preferred languages of program.
So, I tried to do it in excel and I made this:
ID Arrival (sec) Exit (sec) Arrival hour (int) Exit(int)
7656 152845.085 155747.92 42 43
7657 152845.085 155784.91 42 43
7658 147617 150035.747 41 41
7659 152852.811 158216.428 42 43
7660 147617 150076.503 41 41
And then made the output table:
Hour In out Content Running total
41 2 2 0 0
42 3 0 3 3
43 0 3 -3 0
44 0 0 0 0
Can you help to do this in SQL or R? Maybe in a smoother way, that will just add the running total in a column to the first table?
library(dplyr)
library(tidyr)
df %>%
mutate(Arrival_hour_int = floor(Arrival_sec/3600),
Exit_hour_int = floor(Exit_sec/3600)) %>%
select(Arrival_hour_int, Exit_hour_int) %>%
gather(Key, Hour) %>%
group_by(Hour) %>%
summarise(In = sum(Key=='Arrival_hour_int'),
Out = sum(Key=='Exit_hour_int')) %>%
mutate(Content = In - Out,
Running_total = cumsum(Content)) %>%
data.frame()
Output is:
Hour In Out Content Running_total
1 41 2 2 0 0
2 42 3 0 3 3
3 43 0 3 -3 0
Sample data:
df <- structure(list(ID = 7656:7660, Arrival_sec = c(152845.085, 152845.085,
147617, 152852.811, 147617), Exit_sec = c(155747.92, 155784.91,
150035.747, 158216.428, 150076.503)), .Names = c("ID", "Arrival_sec",
"Exit_sec"), class = "data.frame", row.names = c(NA, -5L))