Search code examples
rdataframedplyrtimelubridate

Calculating Visitor Durations in R Data Frame: How to Calculate Time in Agency, Rooms, and Waiting Periods?


I have a data frame in R with data from an agency. Visitors (ID) enter the agency (CREATE), are then called into different rooms (CALL), and leave these rooms (DISCHARGE). Now, I want to calculate how long someone is in the agency overall, how long they are in a room, and how long they wait between two rooms. I have created an example data.frame where the last three columns already contain the desired result, which I ultimately want to achieve. These columns do not exist in the actual data frame.

For diff_since_create, it simply needs to calculate the time since CREATE for each row within an ID.

The variable time_in_room should represent the minutes for an id since the last 'CALL' event. This indicates the time a visitor spends in a room. For the initial 'CALL', this time starts at 0 and ends with a 'DISCHARGE' event.

The variable time_waiting should represent the minutes for an id since the last 'DISCHARGE' event. This indicates the time a visitor (identified by id) spends outside of a room.

Multiple lines can occur between a 'CALL' and 'DISCHARGE' event, or between a 'DISCHARGE' and a 'CALL' event.

Does anyone have an idea how to calculate the last 3 rows with R? I would greatly appreciate your help!

structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), time = structure(c(1713164400, 
1713164700, 1713165000, 1713165600, 1713165720, 1713165780, 1713166200, 
1713166500, 1713167100, 1713164400, 1713164700, 1713165000, 1713165600, 
1713166020, 1713166200, 1713166500, 1713166620, 1713167100), class = c("POSIXct", 
"POSIXt"), tzone = ""), action = c("create", "call", "discharge", 
"call", "work", "work", "discharge", "call", "discharge", "create", 
"call", "discharge", "call", "work", "discharge", "call", "work", 
"discharge"), room = c("", "room 1", "room 1", "room 2", "room 2", 
"room 2", "room 2", "room 3", "room 3", "", "room 1", "room 1", 
"room 2", "room 2", "room 2", "room 3", "room 3", "room 3"), 
    diff_since_create = c(0L, 5L, 10L, 20L, 22L, 23L, 30L, 35L, 
    45L, 0L, 5L, 10L, 20L, 20L, 30L, 35L, 37L, 45L), time_in_room = c("na", 
    "0", "5", "0", "2", "3", "10", "0", "10", "na", "0", "5", 
    "0", "7", "10", "0", "2", "10"), time_waiting = c(0L, 5L, 
    0L, 10L, 0L, 0L, 0L, 5L, 0L, 0L, 5L, 0L, 10L, 0L, 0L, 5L, 
    0L, 0L)), row.names = c(NA, -18L), class = "data.frame")

Solution

  • df |>
      mutate(dsc = (time - min(time))/lubridate::dminutes(), .by = id, 
             .after = diff_since_create) |>
      mutate(tir = if_else(action == "create", NA_real_,
                           (time - min(time))/lubridate::dminutes()), 
             .after = time_in_room,
             .by = c(id, room)) |>
      mutate(last_disch = if_else(action %in% c("discharge", "create"), time, NA_POSIXct_)) |>
      group_by(id) |> fill(last_disch) |> ungroup() |>
      mutate(tw = if_else(action == "call", (time - last_disch)/lubridate::dminutes(), 0),
             .after = time_waiting) |>
      select(-last_disch)
    

    Result (This mostly matches. I presume row 14 diff_since_create was meant to be 27.)

    # A tibble: 18 × 10
          id time                action    room     diff_since_create   dsc time_in_room   tir time_waiting    tw
       <int> <dttm>              <chr>     <chr>                <int> <dbl> <chr>        <dbl>        <int> <dbl>
     1     1 2024-04-15 00:00:00 create    ""                       0     0 na              NA            0     0
     2     1 2024-04-15 00:05:00 call      "room 1"                 5     5 0                0            5     5
     3     1 2024-04-15 00:10:00 discharge "room 1"                10    10 5                5            0     0
     4     1 2024-04-15 00:20:00 call      "room 2"                20    20 0                0           10    10
     5     1 2024-04-15 00:22:00 work      "room 2"                22    22 2                2            0     0
     6     1 2024-04-15 00:23:00 work      "room 2"                23    23 3                3            0     0
     7     1 2024-04-15 00:30:00 discharge "room 2"                30    30 10              10            0     0
     8     1 2024-04-15 00:35:00 call      "room 3"                35    35 0                0            5     5
     9     1 2024-04-15 00:45:00 discharge "room 3"                45    45 10              10            0     0
    10     2 2024-04-15 00:00:00 create    ""                       0     0 na              NA            0     0
    11     2 2024-04-15 00:05:00 call      "room 1"                 5     5 0                0            5     5
    12     2 2024-04-15 00:10:00 discharge "room 1"                10    10 5                5            0     0
    13     2 2024-04-15 00:20:00 call      "room 2"                20    20 0                0           10    10
    14     2 2024-04-15 00:27:00 work      "room 2"                20    27 7                7            0     0
    15     2 2024-04-15 00:30:00 discharge "room 2"                30    30 10              10            0     0
    16     2 2024-04-15 00:35:00 call      "room 3"                35    35 0                0            5     5
    17     2 2024-04-15 00:37:00 work      "room 3"                37    37 2                2            0     0
    18     2 2024-04-15 00:45:00 discharge "room 3"                45    45 10              10            0     0