Search code examples
rdifference

How to find if a value differs from the previous line and repeat that information in next line (like a loop)?


Problem

The title is misleading, since it is more complex than that. I need to find the difference of value between 2 lines but from 2 different variables, and I also need to repeat that information to all rows below.

I have a list of patients, who came in and out of the hospital where I work. Sometimes, during the hospitalisation, they are moved from an unit to another unit (emergency to intensive care for example).

I want to know how many time they actually left then returned to the hospital.

To find that, I simply need to search for a time difference between the entry day and the previous exit day. If both are equal, then this is the same stay. If they differ, then it is another stay.

It is kinda hard to explain with my english level, so I present you an example below.

Example

I work with data.table, but you can use dplyr if you wish to. I should be able to convert it easily.*

# ==== Library ====
require(data.table)

# ==== Data set ====
## The patient id, the unit id, and the entry and exit date from individuals unit
patient_id <- c(rep(x = "0034280", 4), rep(x = "0002050", 2))
unit_id    <- c(c("azr", "grt", "chd", "grt"), c("tgo", "grt"))
date_entry <- c(c("2021-07-10", "2021-07-13", "2021-07-14", "2021-07-30"),c("2021-07-29", "2021-07-30"))
date_exit  <- c(c("2021-07-10", "2021-07-14", "2021-07-25", "2021-07-30"),c("2021-07-30", "2021-07-30"))

## The variable I want to get
expected_result <- c(c(1,2,2,3), c(1,1))

## Final result expected
data_set <- data.table(patient_id, unit_id, date_entry, date_exit, expected_result)

As you can see, since "2021-07-13" on line 2 differ from "2021-07-10" on line 1, the expected result which indicate the number of hospitalisation for that patient increased.

My attempt

First, I start to create a base value for the new variable

data_set <- data_set[
  j = stay_number := 1
]

Then, with the shift function, I can check the difference between two consecutives rows.

data_set <- data_set[
  j = stay_number := data.table::fifelse(test = date_entry != data.table::shift(date_exit, type = "lag"),
                                        yes  = stay_number+1,
                                        no   = stay_number),
  by = patient_id
][
  j = stay_number := data.table::fifelse(test = base::is.na(stay_number),
                                        yes  = 1,
                                        no   = stay_number)
]

But I do not know how to repeat the number "2" to the third row, which is the same hospitalization as the row 2. Therefore, I do not know how to find "3" on the fourth row, as it is a third hospitalization for that patient.

Solution

Thanks to Ronak Shah !

data_set[, result := cumsum(date_entry != shift(date_exit, fill = FALSE)), patient_id]

Solution

  • You can increment the count if date_exit from previous row is different than the current date_entry for each patient_id.

    library(data.table)
    data_set[, result := cumsum(date_entry != shift(date_exit, fill = FALSE)), patient_id]
    
    #   patient_id unit_id date_entry  date_exit expected_result result
    #1:    0034280     azr 2021-07-10 2021-07-10               1      1
    #2:    0034280     grt 2021-07-13 2021-07-14               2      2
    #3:    0034280     chd 2021-07-14 2021-07-25               2      2
    #4:    0034280     grt 2021-07-30 2021-07-30               3      3
    #5:    0002050     tgo 2021-07-29 2021-07-30               1      1
    #6:    0002050     grt 2021-07-30 2021-07-30               1      1