Search code examples
rif-statementdplyrlagshift

copying values to the next row in time based on a third column in r


I'm attempting to create a new column smk_R from the data I have. For each ID variable, I have data of two types over time. Type 1 data are my anchors and will be kept to use for later analysis. The information in Type 0 rows is also important and should be pushed to the next Type 1 row later in time within each ID. Essentially, I am looking to see if people smoked a cigarette between two Type 1 assessments (smk=0 for no and smk=1 for yes). If they did, the next Type 1 assessment should indicate smk_R=1 even if smk=0 at that specific Type 1 assessment. Any thoughts on how to do this would be much appreciated. I don't have the variable grp in my data but if that can be created from dat1, I think I can take the max of smk within group to get smk_R.

ID<-c(5,5,5,5,5,5,5,5,5,5,5,5,5,5,9,9,9,9,9,9,9,9,9,9,9,9,9,9)
time<-c(0.16,0.35,0.72,1.17,1.19,1.19,1.65,1.99,2.2,2.37,2.78,3.57,3.88,4.12,0.29,0.35,0.79,1.17,1.29,1.29,1.75,1.96,2.27,2.57,2.78,3.57,4.88,5.12)
type<-c(0,1,0,1,0,1,0,0,0,0,0,1,1,1,0,1,0,1,0,1,0,0,0,0,0,1,1,1)
smk<-c(1,0,0,0,0,1,1,1,1,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,1,1)
grp<-c(1,1,2,2,3,3,4,4,4,4,4,4,5,6,1,1,2,2,3,3,4,4,4,4,4,4,5,6)
smk_R<-c(1,1,0,0,1,1,1,1,1,1,1,1,0,1,1,1,0,0,1,1,0,0,0,0,0,0,1,1)

dat1<-cbind.data.frame(ID,time,type,smk)
dat1

   ID time type smk
1   5 0.16    0   1
2   5 0.35    1   0
3   5 0.72    0   0
4   5 1.17    1   0
5   5 1.19    0   0
6   5 1.19    1   1
7   5 1.65    0   1
8   5 1.99    0   1
9   5 2.20    0   1
10  5 2.37    0   0
11  5 2.78    0   0
12  5 3.57    1   0
13  5 3.88    1   0
14  5 4.12    1   0
15  9 0.29    0   1
16  9 0.35    1   0
17  9 0.79    0   0
18  9 1.17    1   0
19  9 1.29    0   0
20  9 1.29    1   1
21  9 1.75    0   0
22  9 1.96    0   0
23  9 2.27    0   0
24  9 2.57    0   0
25  9 2.78    0   0
26  9 3.57    1   0
27  9 4.88    1   1
28  9 5.12    1   1

dat2<-cbind.data.frame(dat1,grp,smk_R)
dat2
    ID time type smk grp smk_R
 1   5 0.16    0   1   1     1
 2   5 0.35    1   0   1     1
 3   5 0.72    0   0   2     0
 4   5 1.17    1   0   2     0
 5   5 1.19    0   0   3     1
 6   5 1.19    1   1   3     1
 7   5 1.65    0   1   4     1
 8   5 1.99    0   1   4     1
 9   5 2.20    0   1   4     1
 10  5 2.37    0   0   4     1
 11  5 2.78    0   0   4     1
 12  5 3.57    1   0   4     1
 13  5 3.88    1   0   5     0
 14  5 4.12    1   0   6     1
 15  9 0.29    0   1   1     1
 16  9 0.35    1   0   1     1
 17  9 0.79    0   0   2     0
 18  9 1.17    1   0   2     0
 19  9 1.29    0   0   3     1
 20  9 1.29    1   1   3     1
 21  9 1.75    0   0   4     0
 22  9 1.96    0   0   4     0
 23  9 2.27    0   0   4     0
 24  9 2.57    0   0   4     0
 25  9 2.78    0   0   4     0
 26  9 3.57    1   0   4     0
 27  9 4.88    1   1   5     1
 28  9 5.12    1   1   6     1

Solution

  • The addition in your comment looks like a good approach. Then you could do (for example):

    library(dplyr)
    
    dat2 <- dat1 %>%
      arrange(ID, time, type) %>%
      group_by(ID) %>%
      mutate(grp = cumsum(c(1, type[-n()]))) %>%
      group_by(ID, grp) %>%
      mutate(smk_R = max(smk))
    
    as.data.frame(dat2)