Search code examples
rdata-manipulation

How to calculate duration in R sequentially


I have a dataset that looks like this where patients are intubated (based on ObservationValue of "Start") and extubation (based on ObservationValue of "Stop"). Occasionally there is unfortunately misentries where the patient has an extubation before the intubation, as seen in EncounterID #3 below, where they had an extubation, followed by an intubation. In those instances I have no interest in calculating their ventilation duration. My question therefore is, how can I calculate the duration in which I subtract the first extubation that happened after each intubation? My apologies if this is a basic question, I am still fairly new to using R for data management.

Here is an example of my dataset: enter image description here

And this is what I would like ideally: enter image description here

Here is dput output of the dataset I currently have:

test<-structure(list(EncounterID=structure(c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3)),
                ObservationDate=structure(c("2018-01-12 15:27:00", "2018-01-12 19:02:00", "2018-03-03 21:09:00", "2018-03-06 07:56:00", 
                                            "2019-12-03 15:54:00", "2019-12-03 20:06:00", "2019-12-04 11:40:00", "2019-12-06 08:13:00",
                                            "2019-12-23 18:50:00", "2019-12-23 16:00:00")),
                ObservationValue=structure(c("Start", "Stop", "Start", "Stop", "Start", "Stop", "Start", "Stop", "Stop", "Start"))),
                row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))
test$ObservationDate<-as.POSIXct(test[["ObservationDate"]],tz="", format="%Y-%m-%d %H:%M")

Solution

  • With a bit of data wrangling and reshaping to wide format you could do:

    library(dplyr)
    library(tidyr)
    
    test |> 
      arrange(EncounterID, ObservationDate) |> 
      group_by(EncounterID) |> 
      mutate(id = cumsum(ObservationValue == "Start")) |> 
      ungroup() |> 
      pivot_wider(names_from = ObservationValue, values_from = ObservationDate) |> 
      rename(Inturbation = Start, Exturbation = Stop) |> 
      mutate(duration = difftime(Exturbation, Inturbation, units = "days"))
    #> # A tibble: 5 × 5
    #>   EncounterID    id Inturbation         Exturbation         duration      
    #>         <dbl> <int> <dttm>              <dttm>              <drtn>        
    #> 1           1     1 2018-01-12 15:27:00 2018-01-12 19:02:00 0.1493056 days
    #> 2           1     2 2018-03-03 21:09:00 2018-03-06 07:56:00 2.4493056 days
    #> 3           2     1 2019-12-03 15:54:00 2019-12-03 20:06:00 0.1750000 days
    #> 4           2     2 2019-12-04 11:40:00 2019-12-06 08:13:00 1.8562500 days
    #> 5           3     1 2019-12-23 16:00:00 2019-12-23 18:50:00 0.1180556 days