Search code examples
rdaterangesurvival-analysis

Divide a dataframe per school year


I have a data frame containing multiple entries for survival analysis. I would like to include time-varying covariates, namely classes. I have for example a student that entered the study on 2008-12-09 and left it almost 6 years later.

I am wondering if there exists a smart way to divide his entry in multiple ones based on how many times he "crossed" august 1st and changed class.

For example, I would like to convert the following data frame

d <- data.frame(RandomID = 3350, injury = 0, 
enter = as.Date("2008-12-09", format = "%Y-%m-%d"), 
exit= as.Date("2014-07-02", format = "%Y-%m-%d"), injury_nb = 0)

d

 RandomID injury  enter   exit     injury_nb  class
1     3350  0  2008-12-09 2014-07-02    0       0

into the following

 RandomID injury  enter   exit     injury_nb   class
1     3350  0  2008-12-09 2009-07-31    0        0
2     3350  0  2009-08-01 2010-07-31    0        1
3     3350  0  2010-08-01 2011-07-31    0        2
4     3350  0  2011-08-01 2012-07-31    0        3
5     3350  0  2012-08-01 2013-07-31    0        4
6     3350  0  2013-08-01 2014-07-02    0        5

Note that I want to keep its current information constant e.g. RandomID and injury_nb and that the enter and exit dates are arbitrary.

Best regards,

Alex


Solution

  • This could be an option (Not very elegant but works)

    d$enter = paste(c("2008-12-09", as.character(seq(as.Date("2009-08-01"), as.Date("2013-08-01"), "years"))), collapse =",")
    d$exit  = paste(c(as.character(seq(as.Date("2009-07-31"), as.Date("2013-07-31"), "years")), "2014-07-02"), collapse =",")
    d$class = paste(seq(0,5, by = 1, collapse =",")
    
    library(splitstackshape)
    cSplit(d, c('enter', 'exit', 'class'), ',', 'long')
    
    #   RandomID injury      enter       exit injury_nb class
    #1:     3350      0 2008-12-09 2009-07-31         0     0
    #2:     3350      0 2009-08-01 2010-07-31         0     1
    #3:     3350      0 2010-08-01 2011-07-31         0     2
    #4:     3350      0 2011-08-01 2012-07-31         0     3
    #5:     3350      0 2012-08-01 2013-07-31         0     4
    #6:     3350      0 2013-08-01 2014-07-02         0     5