I am a beginner in R, and I would like to do a survival analysis on the dataset about light bulbs I have. I would like to calculate the lifetime of a light bulb, so I need to calculate the time period between date_broken
in row 2 and date_solved
in row 1 for example.
I know I can use difftime(time, time2, units = "days")
to calculate the time between date_fixed
and date_broken
in the same row, but then I would calculate the time the light bulb was broken and that is not what I am interested in.
I provided a small sample of my data below. For each light bulb on a particular location I have information about the date it broke and the day it was fixed.
(Besides the columns given in the example below, I have other features that should have predictive value.)
# date_broken date_fixed lightbulb location
# 1 26-2-2015 17-3-2015 1 A
# 2 19-3-2015 26-3-2015 1 A
# 3 26-3-2015 26-3-2015 1 A
# 4 17-4-2015 29-4-2015 2 B
# 5 19-6-2015 25-6-2015 2 B
# 6 9-7-2015 30-7-2015 2 B
ds <- data.frame( date_broken = c("26-2-2015", "19-3-2015",
"26-3-2015", "17-4-2015",
"19-6-2015", "9-7-2015"),
date_fixed = c("17-3-2015", "26-3-2015", "26-3-2015", "29-4-2015", "25-6-2015", "30-7-2015"),
lightbulb = c("1`", "1", "1", "2", "2", "2"), location = c("A", "A", "A", "B", "B", "B"))
First you'll need to fix your dates, as @Gaurav suggested. Then, you'll need to summarize by lightbulb
, or the difference will be meaningless.
I present here an alternative using packages lubridate
and data.table
:
library(lubridate)
library(data.table)
ds$date_broken <- dmy(ds$date_broken)
ds$date_fixed <- dmy(ds$date_fixed)
setDT(ds)
setDT(ds)[, dt := difftime(date_fixed, shift(date_broken, 1L, type="lag"), "days"), by = lightbulb]
ds
Which produces:
## date_broken date_fixed lightbulb location dt
## 1: 2015-02-26 2015-03-17 1 A NA days
## 2: 2015-03-19 2015-03-26 1 A 28 days
## 3: 2015-03-26 2015-03-26 1 A 7 days
## 4: 2015-04-17 2015-04-29 2 B NA days
## 5: 2015-06-19 2015-06-25 2 B 69 days
## 6: 2015-07-09 2015-07-30 2 B 41 days
For a future opportunity, it's a lot of help when you produce some expected results, along with your question.