I have a data frame in R with data from an agency. Visitors (ID) enter the agency (CREATE), are then called into different rooms (CALL), and leave these rooms (DISCHARGE). Now, I want to calculate how long someone is in the agency overall, how long they are in a room, and how long they wait between two rooms. I have created an example data.frame where the last three columns already contain the desired result, which I ultimately want to achieve. These columns do not exist in the actual data frame.
For diff_since_create
, it simply needs to calculate the time since CREATE for each row within an ID.
The variable time_in_room
should represent the minutes for an id since the last 'CALL' event. This indicates the time a visitor spends in a room. For the initial 'CALL', this time starts at 0 and ends with a 'DISCHARGE' event.
The variable time_waiting
should represent the minutes for an id since the last 'DISCHARGE' event. This indicates the time a visitor (identified by id) spends outside of a room.
Multiple lines can occur between a 'CALL' and 'DISCHARGE' event, or between a 'DISCHARGE' and a 'CALL' event.
Does anyone have an idea how to calculate the last 3 rows with R? I would greatly appreciate your help!
structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), time = structure(c(1713164400,
1713164700, 1713165000, 1713165600, 1713165720, 1713165780, 1713166200,
1713166500, 1713167100, 1713164400, 1713164700, 1713165000, 1713165600,
1713166020, 1713166200, 1713166500, 1713166620, 1713167100), class = c("POSIXct",
"POSIXt"), tzone = ""), action = c("create", "call", "discharge",
"call", "work", "work", "discharge", "call", "discharge", "create",
"call", "discharge", "call", "work", "discharge", "call", "work",
"discharge"), room = c("", "room 1", "room 1", "room 2", "room 2",
"room 2", "room 2", "room 3", "room 3", "", "room 1", "room 1",
"room 2", "room 2", "room 2", "room 3", "room 3", "room 3"),
diff_since_create = c(0L, 5L, 10L, 20L, 22L, 23L, 30L, 35L,
45L, 0L, 5L, 10L, 20L, 20L, 30L, 35L, 37L, 45L), time_in_room = c("na",
"0", "5", "0", "2", "3", "10", "0", "10", "na", "0", "5",
"0", "7", "10", "0", "2", "10"), time_waiting = c(0L, 5L,
0L, 10L, 0L, 0L, 0L, 5L, 0L, 0L, 5L, 0L, 10L, 0L, 0L, 5L,
0L, 0L)), row.names = c(NA, -18L), class = "data.frame")
df |>
mutate(dsc = (time - min(time))/lubridate::dminutes(), .by = id,
.after = diff_since_create) |>
mutate(tir = if_else(action == "create", NA_real_,
(time - min(time))/lubridate::dminutes()),
.after = time_in_room,
.by = c(id, room)) |>
mutate(last_disch = if_else(action %in% c("discharge", "create"), time, NA_POSIXct_)) |>
group_by(id) |> fill(last_disch) |> ungroup() |>
mutate(tw = if_else(action == "call", (time - last_disch)/lubridate::dminutes(), 0),
.after = time_waiting) |>
select(-last_disch)
Result (This mostly matches. I presume row 14 diff_since_create was meant to be 27.)
# A tibble: 18 × 10
id time action room diff_since_create dsc time_in_room tir time_waiting tw
<int> <dttm> <chr> <chr> <int> <dbl> <chr> <dbl> <int> <dbl>
1 1 2024-04-15 00:00:00 create "" 0 0 na NA 0 0
2 1 2024-04-15 00:05:00 call "room 1" 5 5 0 0 5 5
3 1 2024-04-15 00:10:00 discharge "room 1" 10 10 5 5 0 0
4 1 2024-04-15 00:20:00 call "room 2" 20 20 0 0 10 10
5 1 2024-04-15 00:22:00 work "room 2" 22 22 2 2 0 0
6 1 2024-04-15 00:23:00 work "room 2" 23 23 3 3 0 0
7 1 2024-04-15 00:30:00 discharge "room 2" 30 30 10 10 0 0
8 1 2024-04-15 00:35:00 call "room 3" 35 35 0 0 5 5
9 1 2024-04-15 00:45:00 discharge "room 3" 45 45 10 10 0 0
10 2 2024-04-15 00:00:00 create "" 0 0 na NA 0 0
11 2 2024-04-15 00:05:00 call "room 1" 5 5 0 0 5 5
12 2 2024-04-15 00:10:00 discharge "room 1" 10 10 5 5 0 0
13 2 2024-04-15 00:20:00 call "room 2" 20 20 0 0 10 10
14 2 2024-04-15 00:27:00 work "room 2" 20 27 7 7 0 0
15 2 2024-04-15 00:30:00 discharge "room 2" 30 30 10 10 0 0
16 2 2024-04-15 00:35:00 call "room 3" 35 35 0 0 5 5
17 2 2024-04-15 00:37:00 work "room 3" 37 37 2 2 0 0
18 2 2024-04-15 00:45:00 discharge "room 3" 45 45 10 10 0 0