The title is misleading, since it is more complex than that. I need to find the difference of value between 2 lines but from 2 different variables, and I also need to repeat that information to all rows below.
I have a list of patients, who came in and out of the hospital where I work. Sometimes, during the hospitalisation, they are moved from an unit to another unit (emergency to intensive care for example).
I want to know how many time they actually left then returned to the hospital.
To find that, I simply need to search for a time difference between the entry day and the previous exit day. If both are equal, then this is the same stay. If they differ, then it is another stay.
It is kinda hard to explain with my english level, so I present you an example below.
I work with data.table
, but you can use dplyr
if you wish to. I should be able to convert it easily.*
# ==== Library ====
require(data.table)
# ==== Data set ====
## The patient id, the unit id, and the entry and exit date from individuals unit
patient_id <- c(rep(x = "0034280", 4), rep(x = "0002050", 2))
unit_id <- c(c("azr", "grt", "chd", "grt"), c("tgo", "grt"))
date_entry <- c(c("2021-07-10", "2021-07-13", "2021-07-14", "2021-07-30"),c("2021-07-29", "2021-07-30"))
date_exit <- c(c("2021-07-10", "2021-07-14", "2021-07-25", "2021-07-30"),c("2021-07-30", "2021-07-30"))
## The variable I want to get
expected_result <- c(c(1,2,2,3), c(1,1))
## Final result expected
data_set <- data.table(patient_id, unit_id, date_entry, date_exit, expected_result)
As you can see, since "2021-07-13" on line 2 differ from "2021-07-10" on line 1, the expected result which indicate the number of hospitalisation for that patient increased.
First, I start to create a base value for the new variable
data_set <- data_set[
j = stay_number := 1
]
Then, with the shift function, I can check the difference between two consecutives rows.
data_set <- data_set[
j = stay_number := data.table::fifelse(test = date_entry != data.table::shift(date_exit, type = "lag"),
yes = stay_number+1,
no = stay_number),
by = patient_id
][
j = stay_number := data.table::fifelse(test = base::is.na(stay_number),
yes = 1,
no = stay_number)
]
But I do not know how to repeat the number "2" to the third row, which is the same hospitalization as the row 2. Therefore, I do not know how to find "3" on the fourth row, as it is a third hospitalization for that patient.
Thanks to Ronak Shah !
data_set[, result := cumsum(date_entry != shift(date_exit, fill = FALSE)), patient_id]
You can increment the count if date_exit
from previous row is different than the current date_entry
for each patient_id
.
library(data.table)
data_set[, result := cumsum(date_entry != shift(date_exit, fill = FALSE)), patient_id]
# patient_id unit_id date_entry date_exit expected_result result
#1: 0034280 azr 2021-07-10 2021-07-10 1 1
#2: 0034280 grt 2021-07-13 2021-07-14 2 2
#3: 0034280 chd 2021-07-14 2021-07-25 2 2
#4: 0034280 grt 2021-07-30 2021-07-30 3 3
#5: 0002050 tgo 2021-07-29 2021-07-30 1 1
#6: 0002050 grt 2021-07-30 2021-07-30 1 1