Search code examples
rdatedataframerecode

How to compute a new variable based on the number of days since a particular type of record


I'm trying to create a variable that shows the number of days since a particular event occurred. This is a follow up to this previous question, using the same data.

The data looks like this (note dates are in DD-MM-YYYY format):

ID  date      drug  score
A   28/08/2016  2   3
A   29/08/2016  1   4
A   30/08/2016  2   4
A   2/09/2016   2   4
A   3/09/2016   1   4
A   4/09/2016   2   4
B   8/08/2016   1   3
B   9/08/2016   2   4
B   10/08/2016  2   3
B   11/08/2016  1   3
C   30/11/2016  2   4
C   2/12/2016   1   5
C   3/12/2016   2   1
C   5/12/2016   1   4
C   6/12/2016   2   4
C   8/12/2016   1   2
C   9/12/2016   1   2 

For 'drug': 1=drug taken, 2=no drug taken.

Each time the value of drug is 1, if that ID has a previous record that is also drug==1, then I need to generate a new value 'lagtime' that shows the number of days (not the number of rows!) since the previous time the drug was taken.

So the output I am looking for is:

ID  date      drug  score  lagtime
A   28/08/2016  2   3
A   29/08/2016  1   4
A   30/08/2016  2   4
A   2/09/2016   2   4
A   3/09/2016   1   4      5
A   4/09/2016   2   4
B   8/08/2016   1   3
B   9/08/2016   2   4
B   10/08/2016  2   3
B   11/08/2016  1   3      3
C   30/11/2016  2   4
C   2/12/2016   1   5
C   3/12/2016   2   1
C   5/12/2016   1   4      3
C   6/12/2016   2   4
C   8/12/2016   1   2      3
C   9/12/2016   1   2      1

So I need a way to generate (mutate?) this lagtime score that is calculated as the date for each drug==1 record, minus the date of the previous drug==1 record, grouped by ID. This has me completely bamboozled.

Here's code for the example data:

data<-data.frame(ID=c("A","A","A","A","A","A","B","B","B","B","C","C","C","C","C","C","C"),
                 date=as.Date(c("28/08/2016","29/08/2016","30/08/2016","2/09/2016","3/09/2016","4/09/2016","8/08/2016","9/08/2016","10/08/2016","11/08/2016","30/11/2016","2/12/2016","3/12/2016","5/12/2016","6/12/2016","8/12/2016","9/12/2016"),format= "%d/%m/%Y"),
                 drug=c(2,1,2,2,1,2,1,2,2,1,2,1,2,1,2,1,1),
                 score=c(3,4,4,4,4,4,3,4,3,3,4,5,1,4,4,2,2))

Solution

  • We can use data.table. Convert the 'data.frame' to 'data.table' (setDT(data)), grouped by 'ID', specify the i (drug ==1), get the difference of 'date' (diff(date)), concatenate with NA as the diff output length is 1 less than the original vector, convert to integer and assign (:=) to create the 'lagtime'. By default, all other values will be NA

    library(data.table)
    setDT(data)[drug==1, lagtime := as.integer(c(NA, diff(date))), ID]
    data
    #    ID       date drug score lagtime
    # 1:  A 2016-08-28    2     3      NA
    # 2:  A 2016-08-29    1     4      NA
    # 3:  A 2016-08-30    2     4      NA
    # 4:  A 2016-09-02    2     4      NA
    # 5:  A 2016-09-03    1     4       5
    # 6:  A 2016-09-04    2     4      NA
    # 7:  B 2016-08-08    1     3      NA
    # 8:  B 2016-08-09    2     4      NA
    # 9:  B 2016-08-10    2     3      NA
    #10:  B 2016-08-11    1     3       3
    #11:  C 2016-11-30    2     4      NA
    #12:  C 2016-12-02    1     5      NA
    #13:  C 2016-12-03    2     1      NA
    #14:  C 2016-12-05    1     4       3
    #15:  C 2016-12-06    2     4      NA
    #16:  C 2016-12-08    1     2       3
    #17:  C 2016-12-09    1     2       1