Search code examples
rdata.tablerangedata-manipulation

Repeat rows using variable date range in data table column in R


I have a data.table in R where two columns are start & stop dates. I need to repeat each row using its corresponding date range.

library(data.table)
input = data.table(X=c('John','John','Peter'),Age=c(20,25,12),StartDate=as.Date(c('2022-01-20','2018-12-03','2023-01-05')),StopDate=as.Date(c('2022-01-23','2018-12-07','2023-01-05')))

X     | Age| StartDate  | StopDate              X     | Age| Date
John  | 20 | 2022-01-20 | 2022-01-23    -->     John  | 20 | 2022-01-20
John  | 25 | 2018-12-03 | 2018-12-07            John  | 20 | 2022-01-21
Peter | 12 | 2023-01-05 | 2023-01-05            John  | 20 | 2022-01-22
                                                John  | 20 | 2022-01-23
                                                John  | 25 | 2018-12-03
                                                John  | 25 | 2018-12-04
                                                John  | 25 | 2018-12-05
                                                John  | 25 | 2018-12-06
                                                John  | 25 | 2018-12-07
                                                Peter | 15 | 2023-01-05
          

Solution

  • library(data.table)
    input = data.table(X=c('John','John','Peter'),Age=c(20,25,12),StartDate=as.Date(c('2022-01-20','2018-12-03','2023-01-05')),StopDate=as.Date(c('2022-01-23','2018-12-07','2023-01-05')))
    
    input[,.(X, Age, Date = seq.Date(StartDate, StopDate, 1)), by=.(n=row.names(input))][,-1]
    #>         X Age       Date
    #>  1:  John  20 2022-01-20
    #>  2:  John  20 2022-01-21
    #>  3:  John  20 2022-01-22
    #>  4:  John  20 2022-01-23
    #>  5:  John  25 2018-12-03
    #>  6:  John  25 2018-12-04
    #>  7:  John  25 2018-12-05
    #>  8:  John  25 2018-12-06
    #>  9:  John  25 2018-12-07
    #> 10: Peter  12 2023-01-05