Search code examples
sqlrdataframetimedata-cleaning

Cross-tabulation based on start and end time in R


I have a data frame as the following.

    PERSONID STRTTIME ENDTIME origin dest
1          1     1000    1015      H    O
2          1     1510    1530      O    H
3          2     1800    2030      H    W
4          2      700     900      W    H
5          3     1430    1445      O    H
6          3      845     900      H    O
7          1     1115    1130      H    W
8          1     2330    2340      W    H
9          1     1715    1850      H    O
10         1      900     920      H    W
11         1     2055    2145      O    H
12         1     1530    1605      W    H
13         1     1415    1440      W    W
14         1      550     605      H    O
15         1      805     815      O    H
16         1      730     740      H    O
17         1      700     715      O    H
18         1     1500    1515      H    O
19         1     1700    1715      O    H
20         1     1015    1025      O    O
21         1      830     845      H    O
22         1     1050    1105      O    H
23         2     1800    1818      H    O
24         2     2135    2154      O    H
25         1      800     820      H    W
26         1     1100    1120      W    O
27         1     1125    1130      O    H
28         2     1200    1205      H    O
29         2     1315    1330      O    O
30         2     1405    1415      O    H
31         1      800     945      O    W
32         1     1800    2000      W    H
33         2      900    1100      H    W
34         2     1830    2030      W    H
35         1     1725    1855      W    H
36         1     1200    1210      W    O
37         1      710     820      H    W
38         1     1245    1255      O    W
39         2     1625    1700      O    H
40         2     1535    1554      W    O
41         1     1653    1657      W    O
42         1      809     813      O    O
43         1     1718    1726      O    H
44         1      824     828      O    W
45         1      745     752      H    O
46         2      738     850      O    O
47         2      730     733      H    O
48         2      858     949      O    W
49         1        0      30      W    H
50         1     1300    1325      H    W

I want to do cross-tabulation between "origin" and "dest" but by each calendar hour(1am, 2am,... 11pm) based on STRTTIME and ENDTIME. How could I do it? The format of STRTTIME and ENDTIME is HHMM (e.g., 1015 = 10:15am)

This is what I want, but by hour.

         H      O      W
  H   5446 131981  53612
  O 143058 130553  21667
  W  45056  30682  11568

Solution

  • Suppose your data is df, this script will display the total time in hours (if i understand correctly):

    library(dplyr)
    library(tidyr)
    
    df %>% 
        mutate( hours = unclass(difftime(
            strptime(sprintf("%04d",ENDTIME), "%H%M"), 
            strptime(sprintf("%04d",STRTTIME), "%H%M"), units = "hours"))) %>% 
        group_by(origin, dest) %>% 
        summarise(hours = sum(hours)) %>% 
        pivot_wider(
            names_from= dest, values_from = hours)