Search code examples
rreplication

repeating a number n times, then a different number for the remainder of the rows in R


I have a dataset where each row is a monthly observation of patients, monitoring whether they tested positive to a disease (status).

I know which month (i.e. row number for each ID - TimeToDx) they were diagnosed, and what I would like to do is have a binary indicator that switches from 0 to 1, starting from the observation month indicated in TimeToDx.

Basically I need to replicate 0 by the TimeToDx - 1, then for the remainder of the rows replicate 1 for each ID.

Here is some example data - without the status indicator filled:

 ID TimeToDx    Status
10425   2   
10425   2   
10425   2   
10425   2   
10667   3   
10667   3   
10667   3   
10667   3   
10667   3   
10686   2   
10686   2   
10686   2   
10686   2   
10686   2   
17096   5   
17096   5   
17096   5   
17096   5   
17096   5   

Here is what I would like to see:

ID  TimeToDx    Status
10425   2       0
10425   2       1
10425   2       1
10425   2       1
10667   3       0
10667   3       0
10667   3       1
10667   3       1
10667   3       1
10686   2       0
10686   2       1
10686   2       1
10686   2       1
10686   2       1
17096   5       0
17096   5       0
17096   5       0
17096   5       0
17096   5       1

Any help would be much appreciated.


Solution

  • df <- structure(list(ID = c(10425L, 10425L, 10425L, 10425L, 10667L, 
                                10667L, 10667L, 10667L, 10667L, 10686L, 10686L, 10686L, 10686L, 
                                10686L, 17096L, 17096L, 17096L, 17096L, 17096L), 
                         TimeToDx = c(2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 2L, 2L,
                                      2L, 2L, 2L, 5L, 5L, 5L, 5L, 5L)),
                    class = "data.frame", row.names = c(NA, -19L))
    
    library(tidyverse)
    df %>%
      group_by(ID) %>% 
      mutate(Status = +((row_number() %/% TimeToDx) != 0)) %>% 
      ungroup()
    #> # A tibble: 19 x 3
    #>       ID TimeToDx id_row
    #>    <int>    <int>  <int>
    #>  1 10425        2      0
    #>  2 10425        2      1
    #>  3 10425        2      1
    #>  4 10425        2      1
    #>  5 10667        3      0
    #>  6 10667        3      0
    #>  7 10667        3      1
    #>  8 10667        3      1
    #>  9 10667        3      1
    #> 10 10686        2      0
    #> 11 10686        2      1
    #> 12 10686        2      1
    #> 13 10686        2      1
    #> 14 10686        2      1
    #> 15 17096        5      0
    #> 16 17096        5      0
    #> 17 17096        5      0
    #> 18 17096        5      0
    #> 19 17096        5      1
    

    Created on 2021-10-21 by the reprex package (v2.0.1)

    data.table

    using @Jon Spring's solution

    library(data.table)
    setDT(df)[, Status := +(rowid(ID) >= TimeToDx)][]
    #>        ID TimeToDx Status
    #>  1: 10425        2      0
    #>  2: 10425        2      1
    #>  3: 10425        2      1
    #>  4: 10425        2      1
    #>  5: 10667        3      0
    #>  6: 10667        3      0
    #>  7: 10667        3      1
    #>  8: 10667        3      1
    #>  9: 10667        3      1
    #> 10: 10686        2      0
    #> 11: 10686        2      1
    #> 12: 10686        2      1
    #> 13: 10686        2      1
    #> 14: 10686        2      1
    #> 15: 17096        5      0
    #> 16: 17096        5      0
    #> 17: 17096        5      0
    #> 18: 17096        5      0
    #> 19: 17096        5      1
    

    Created on 2021-10-21 by the reprex package (v2.0.1)