Search code examples
rsurvival-analysis

Making censored variables for surival analysis from dates


I am a beginner in R, and I would like to do a survival analysis on the dataset about light bulbs I have. I would like to calculate the lifetime of a light bulb, so I need to calculate the time period between date_broken in row 2 and date_solved in row 1 for example.

I know I can use difftime(time, time2, units = "days") to calculate the time between date_fixed and date_broken in the same row, but then I would calculate the time the light bulb was broken and that is not what I am interested in.

I provided a small sample of my data below. For each light bulb on a particular location I have information about the date it broke and the day it was fixed.

(Besides the columns given in the example below, I have other features that should have predictive value.)

#  date_broken date_fixed lightbulb location
# 1   26-2-2015  17-3-2015     1        A
# 2   19-3-2015  26-3-2015     1        A
# 3   26-3-2015  26-3-2015     1        A
# 4   17-4-2015  29-4-2015     2        B
# 5   19-6-2015  25-6-2015     2        B
# 6    9-7-2015  30-7-2015     2        B



ds <- data.frame(  date_broken = c("26-2-2015", "19-3-2015",
                                   "26-3-2015", "17-4-2015",
                                   "19-6-2015", "9-7-2015"), 
                   date_fixed = c("17-3-2015", "26-3-2015",  "26-3-2015", "29-4-2015", "25-6-2015", "30-7-2015"),
                   lightbulb = c("1`", "1", "1", "2", "2", "2"), location = c("A", "A", "A", "B", "B", "B"))

Solution

  • First you'll need to fix your dates, as @Gaurav suggested. Then, you'll need to summarize by lightbulb, or the difference will be meaningless. I present here an alternative using packages lubridate and data.table:

    library(lubridate)
    library(data.table)
    ds$date_broken <- dmy(ds$date_broken)
    ds$date_fixed <- dmy(ds$date_fixed)
    setDT(ds)
    
    setDT(ds)[, dt := difftime(date_fixed, shift(date_broken, 1L, type="lag"), "days"), by = lightbulb]
    ds
    

    Which produces:

       ##    date_broken date_fixed lightbulb location      dt
       ## 1:  2015-02-26 2015-03-17         1        A NA days
       ## 2:  2015-03-19 2015-03-26         1        A 28 days
       ## 3:  2015-03-26 2015-03-26         1        A  7 days
       ## 4:  2015-04-17 2015-04-29         2        B NA days
       ## 5:  2015-06-19 2015-06-25         2        B 69 days
       ## 6:  2015-07-09 2015-07-30         2        B 41 days
    

    For a future opportunity, it's a lot of help when you produce some expected results, along with your question.