Search code examples
rcontentoffset

offset date column content by id


I have a dataset as follows

 Id     Date1                
 121    2011-01-03
 121    2011-01-03
 121    2011-04-02
 121    2011-08-14
 121    2012-01-14
 121    2012-05-12
 975    2011-02-01
 975    2011-02-01
 975    2011-06-14
 975    2012-01-06
 975    2012-04-19
 975    2012-09-25

What I want to create is an output like this below, where the new Date2 column is offset by one value based on the id,

 Id     Date1        Date2                         
 121    2011-01-03   2011-01-03
 121    2011-01-03   2011-04-02
 121    2011-04-02   2011-08-14
 121    2011-08-14   2012-01-14
 121    2012-01-14   2012-05-12 
 121    2012-05-12   NA

 975    2011-02-01   2011-02-01
 975    2011-02-01   2011-06-14
 975    2011-06-14   2012-01-06
 975    2012-01-06   2012-04-19
 975    2012-04-19   2012-09-25
 975    2012-09-25   NA

Date2 column row 2 for Id 121 i.e 2011-01-03 becomes Date1 column, row1 value for Id 121.

Date2 column row3 for Id 121 i.e 2011-04-02 becomes Date1 column, row2 value for Id 121....so on...This should happen by id.

Any help is appreciated.


Solution

  • Using dplyr, we can group by 'Id' and create a new column 'Date2' using mutate and lead

    library(dplyr)
    df1 %>%
         group_by(Id) %>% 
         mutate(Date2= lead(Date1))
    # Id      Date1      Date2
    #1  121 2011-01-03 2011-01-03
    #2  121 2011-01-03 2011-04-02
    #3  121 2011-04-02 2011-08-14
    #4  121 2011-08-14 2012-01-14
    #5  121 2012-01-14 2012-05-12
    #6  121 2012-05-12         NA
    #7  975 2011-02-01 2011-02-01
    #8  975 2011-02-01 2011-06-14
    #9  975 2011-06-14 2012-01-06
    #10 975 2012-01-06 2012-04-19
    #11 975 2012-04-19 2012-09-25
    #12 975 2012-09-25         NA
    

    Or a similar option using the devel version of data.table will be converting the 'data.frame' to 'data.table' (setDT(df1)), group by 'Id', and use the shift function with option type='lead'

    library(data.table)#v1.9.5+
    setDT(df1)[, Date2:= shift(Date1, type='lead') , by = Id][]
    #      Id      Date1      Date2
    # 1: 121 2011-01-03 2011-01-03
    # 2: 121 2011-01-03 2011-04-02
    # 3: 121 2011-04-02 2011-08-14
    # 4: 121 2011-08-14 2012-01-14
    # 5: 121 2012-01-14 2012-05-12
    # 6: 121 2012-05-12         NA
    # 7: 975 2011-02-01 2011-02-01
    # 8: 975 2011-02-01 2011-06-14
    # 9: 975 2011-06-14 2012-01-06
    #10: 975 2012-01-06 2012-04-19
    #11: 975 2012-04-19 2012-09-25
    #12: 975 2012-09-25         NA
    

    Or we can use ave from base R. We group by 'Id' column, remove the first observation and concatenate with NA at the end.

    df1$Date2 <- with(df1, ave(Date1, Id, FUN=function(x) c(x[-1], NA)))