I like to impute a variable in grouped paneldata with tidyverse
logic. The story is this: It is survey data and people are asked in particular years (time) for a behavior in the last couple of years. Thus I assume when someone said "I had a car for 5 years", that the car variable in those years can set to be 1. The question was not asked in those years. This is minimal data and the imputation I like to achieve.
paneldata = data.frame(id=c(rep(1,10),rep(2,10)),
time=seq(1:10),
car=c(1,NA,NA,NA,NA,0,NA,NA,NA,1,1,NA,NA,NA,1,NA,NA,NA,NA,1),
car_imp_goal=c(1,NA,NA,NA,NA,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1))
paneldata
Here is what I tried
paneldata <- paneldata %>% mutate(car_imp_trial = car)
paneldata %>% group_by(id) %>% fill(car_imp_trial , .direction = "up")
# A tibble: 20 × 5
# Groups: id [2]
id time car car_imp_goal car_imp_trial
<dbl> <int> <dbl> <dbl> <dbl>
1 1 1 1 1 1
2 1 2 NA NA 0
3 1 3 NA NA 0
4 1 4 NA NA 0
5 1 5 NA NA 0
6 1 6 0 0 0
7 1 7 NA 1 1
8 1 8 NA 1 1
9 1 9 NA 1 1
10 1 10 1 1 1
11 2 1 1 1 1
12 2 2 NA 1 1
13 2 3 NA 1 1
14 2 4 NA 1 1
15 2 5 1 1 1
16 2 6 NA 1 1
17 2 7 NA 1 1
18 2 8 NA 1 1
19 2 9 NA 1 1
20 2 10 1 1 1
The past behavior question is only asked in specificy years (e.g. time 5 and 10). I need to group_by(id)
then use ifelse
condition to select relevant time, i.e. 5 or 10 then was thinking about using fill
. What is wrong about car_imp_trial
is that it filled 0 from year 6, which is not a pasted behaviour question.
Create a time interval id, then fill upwards the car column
paneldata%>%
group_by(id,id2 = cut_interval(time, length = 5,labels =FALSE))%>%
fill(car, .direction = 'up')
# A tibble: 20 × 5
# Groups: id, id2 [4]
id time car car_imp id2
<dbl> <int> <dbl> <dbl> <int>
1 1 1 1 1 1
2 1 2 NA NA 1
3 1 3 NA NA 1
4 1 4 NA NA 1
5 1 5 NA NA 1
6 1 6 0 0 2
7 1 7 1 1 2
8 1 8 1 1 2
9 1 9 1 1 2
10 1 10 1 1 2
11 2 1 1 1 1
12 2 2 1 1 1
13 2 3 1 1 1
14 2 4 1 1 1
15 2 5 1 1 1
16 2 6 1 1 2
17 2 7 1 1 2
18 2 8 1 1 2
19 2 9 1 1 2
20 2 10 1 1 2