I have a data cleaning question. The data collection happened three times and sometimes the data entry was incorrect. Therefore, if the students had their data collected more than one time, the last data point needs to be copied over.
Here is my dataset looks like:
df <- data.frame(id = c(1,1,1, 2,2,2, 3,3,3, 4),
text = c("female","male","male", "female","female","female", "male","female","female", "male"),
time = c("first","second","third", "first","second","third", "first","second","third", "first"))
> df
id text time
1 1 female first
2 1 male second
3 1 male third
4 2 female first
5 2 female second
6 2 female third
7 3 male first
8 3 female second
9 3 female third
10 4 male first
So first and third students have the different gender information because of the wrong input. Need the last time (third) point data copied over the rest.
The desired output would be
> df1
id text time
1 1 male first
2 1 male second
3 1 male third
4 2 female first
5 2 female second
6 2 female third
7 3 female first
8 3 female second
9 3 female third
10 4 male first
Any ideas? Thanks!
We could use last
to return the last value of 'text' which gets recycled
to update the column in mutate
library(dplyr)
df <- df %>%
group_by(id) %>%
mutate(text = last(text)) %>%
ungroup
If we want the second or third value, use nth
and modify the n
to take the min
inum value of 2 or the group size n()
(when there are less than 2 elements per group)
df %>%
group_by(id) %>%
mutate(text = nth(text, min(c(2, n()))))