I have the first three columns of data. The yest column represents that id did something on the previous day. I'm trying to go from dat to dat2 by adding a new variable "new" that does three things:
copies the value of yest to the previous day. The days are not always consecutive though. so it should only copy if it is the actual previous (day 2 vs day 3) and not just from the next row to the previous row.
the value of yest should be copied to all rows of new with the same id / day combo
if there is more than one value of yest per id / day combo, they should be averaged before populating the new variable.
I've been trying different ifelse and merge combos that are failing miserably. Any help with this would be much appreciated.
id<-c(1,1,1,1,1,3,3,3,3,3,3,3,8,8,8,8,8)
day<-c(1,2,2,3,5,0,1,2,3,4,5,5,0,3,4,4,5)
yest<-c(NA,1,3,NA,NA,1,2,NA,NA,NA,3,NA,NA,NA,NA,3,4)
dat<-cbind(id,day,yest)
dat
id day yest
[1,] 1 1 NA
[2,] 1 2 1
[3,] 1 2 3
[4,] 1 3 NA
[5,] 1 5 NA
[6,] 3 0 1
[7,] 3 1 2
[8,] 3 2 NA
[9,] 3 3 NA
[10,] 3 4 NA
[11,] 3 5 3
[12,] 3 5 NA
[13,] 8 0 NA
[14,] 8 3 NA
[15,] 8 4 NA
[16,] 8 4 3
[17,] 8 5 4
new<-c(2,NA,NA,NA,NA,2,NA,NA,NA,3,NA,NA,NA,3,4,4,NA)
dat2<-cbind(dat,new)
dat2
id day yest new
[1,] 1 1 NA 2
[2,] 1 2 1 NA
[3,] 1 2 3 NA
[4,] 1 3 NA NA
[5,] 1 5 NA NA
[6,] 3 0 1 2
[7,] 3 1 2 NA
[8,] 3 2 NA NA
[9,] 3 3 NA NA
[10,] 3 4 NA 3
[11,] 3 5 3 NA
[12,] 3 5 NA NA
[13,] 8 0 NA NA
[14,] 8 3 NA 3
[15,] 8 4 NA 4
[16,] 8 4 3 4
[17,] 8 5 4 NA
library(dplyr)
df <- data.frame(
id = c(1,1,1,1,1,3,3,3,3,3,3,3,8,8,8,8,8),
day = c(1,2,2,3,5,0,1,2,3,4,5,5,0,3,4,4,5),
yest = c(NA,1,3,NA,NA,1,2,NA,NA,NA,3,NA,NA,NA,NA,3,4)
)
First you create a group for each combination of day and id. Then you take the mean of yest, making sure to throw out NAs. Then you subtract 1 from day so you can match it up to the correct day in the original data.
df_lag <- df %>%
group_by(id, day) %>%
summarise(new = mean(yest, na.rm = T)) %>%
ungroup() %>%
mutate(day = day-1)
df_lag
# A tibble: 14 x 3 id day new <dbl> <dbl> <dbl> 1 1 0 NaN 2 1 1 2 3 1 2 NaN 4 1 4 NaN 5 3 -1 1 6 3 0 2 7 3 1 NaN 8 3 2 NaN 9 3 3 NaN 10 3 4 3 11 8 -1 NaN 12 8 2 NaN 13 8 3 3 14 8 4 4
Here you join them by day and id, using a left_join
to throw out the days where there isn't a day in the original dataset (i.e. -1 in lines 5 and 11).
left_join(df, df_lag)
id day yest new 1 1 1 NA 2 2 1 2 1 NaN 3 1 2 3 NaN 4 1 3 NA NA 5 1 5 NA NA 6 3 0 1 2 7 3 1 2 NaN 8 3 2 NA NaN 9 3 3 NA NaN 10 3 4 NA 3 11 3 5 3 NA 12 3 5 NA NA 13 8 0 NA NA 14 8 3 NA 3 15 8 4 NA 4 16 8 4 3 4 17 8 5 4 NA