My data has multiple observations for each ID. At the ID level, I want to convert all values to the most recent non-missing value. I have tried using mutate, group_by(id) and which.max(year) unsuccessfully.
Data:
data <- data.frame(
id=c(1,1,2,2,3,3,4,4,5,5),
year=rep(c(2010, 2011), 5),
employ=c("yes", "yes", "no", "yes", "yes", "no", NA, "yes", "no", NA))
> data
id year employ
1 1 2010 yes
2 1 2011 yes
3 2 2010 no
4 2 2011 yes
5 3 2010 yes
6 3 2011 no
7 4 2010 <NA>
8 4 2011 yes
9 5 2010 no
10 5 2011 <NA>
Desired output:
data2 <- data.frame(
id=c(1,1,2,2,3,3,4,4,5,5),
year=c(2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2010, 2010),
employ=c("yes", "yes", "yes", "yes", "no", "no","yes", "yes","no", "no"))
> data2
id year employ
1 1 2011 yes
2 1 2011 yes
3 2 2011 yes
4 2 2011 yes
5 3 2011 no
6 3 2011 no
7 4 2011 yes
8 4 2011 yes
9 5 2010 no
10 5 2010 no
A data.table
option
setDT(data)[, employ := last(na.omit(employ[order(year)])), id]
gives
id year employ
1: 1 2010 yes
2: 1 2011 yes
3: 2 2010 yes
4: 2 2011 yes
5: 3 2010 no
6: 3 2011 no
7: 4 2010 yes
8: 4 2011 yes
9: 5 2010 no
10: 5 2011 no
A dplyr
way might be
data %>%
group_by(id) %>%
mutate(employ = last(na.omit(employ[order(year)])))
which gives
id year employ
<dbl> <dbl> <chr>
1 1 2010 yes
2 1 2011 yes
3 2 2010 yes
4 2 2011 yes
5 3 2010 no
6 3 2011 no
7 4 2010 yes
8 4 2011 yes
9 5 2010 no
10 5 2011 no