Search code examples
sqlrtableau-api

Determining content every hour based on exit and entry times on units


As you can see I have a dataset showing entry and exit times of an area. I would like to determine the content of this area on an hourly basis, preferably in the same dataset with a new column somehow, or alternatively in a new data set. Below is the data.

ID      Arrival (sec)   Exit (sec)
7656    152845.085      155747.92
7657    152845.085      155784.91
7658    147617          150035.747
7659    152852.811      158216.428
7660    147617          150076.503

The data shows the seconds from 1st of january 00:00 year 2000.

The purpose of this is to create a bar chart in Tableau showing this content per hour over the span of a week. R and SQL are thus preferred languages of program.

So, I tried to do it in excel and I made this:

ID      Arrival (sec)   Exit (sec)  Arrival hour (int)  Exit(int)
7656    152845.085      155747.92   42                  43
7657    152845.085      155784.91   42                  43
7658    147617          150035.747  41                  41
7659    152852.811      158216.428  42                  43
7660    147617          150076.503  41                  41

And then made the output table:

Hour    In  out Content Running total
41      2   2   0       0
42      3   0   3       3
43      0   3   -3      0
44      0   0   0       0

Can you help to do this in SQL or R? Maybe in a smoother way, that will just add the running total in a column to the first table?


Solution

  • library(dplyr)
    library(tidyr)
    
    df %>%
      mutate(Arrival_hour_int = floor(Arrival_sec/3600),
             Exit_hour_int = floor(Exit_sec/3600)) %>%
      select(Arrival_hour_int, Exit_hour_int) %>%
      gather(Key, Hour) %>%
      group_by(Hour) %>%
      summarise(In = sum(Key=='Arrival_hour_int'),
                Out = sum(Key=='Exit_hour_int')) %>%
      mutate(Content = In - Out,
             Running_total = cumsum(Content)) %>%
      data.frame()
    

    Output is:

      Hour In Out Content Running_total
    1   41  2   2       0             0
    2   42  3   0       3             3
    3   43  0   3      -3             0
    

    Sample data:

    df <- structure(list(ID = 7656:7660, Arrival_sec = c(152845.085, 152845.085, 
    147617, 152852.811, 147617), Exit_sec = c(155747.92, 155784.91, 
    150035.747, 158216.428, 150076.503)), .Names = c("ID", "Arrival_sec", 
    "Exit_sec"), class = "data.frame", row.names = c(NA, -5L))