I have a data cleaning question. The data collection happened three times and sometimes the data entry was incorrect. Therefore, if the students had their data collected more than one time, the second
data point needs to be copied over.
Here is my dataset looks like:
df <- data.frame(id = c(1,1,1, 2,2,2, 3,3, 4,4, 5),
text = c("female","male","male", "female","female","female", "male","female","male", "female", "female"),
time = c("first","second","third", "first","second","third", "first","second","second", "third", "first"))
> df
id text time
1 1 female first
2 1 male second
3 1 male third
4 2 female first
5 2 female second
6 2 female third
7 3 male first
8 3 female second
9 4 male second
10 4 female third
11 5 female first
So id
1,3, and 4 have incorrect gender information. I need to copy the second
data point when there are multiple/different input about the gender
variable. If there is only one datapoint, that should stay in the dataset.
The desired output would be
> df1
id text time
1 1 male first
2 1 male second
3 1 male third
4 2 female first
5 2 female second
6 2 female third
7 3 female first
8 3 female second
9 4 male second
10 4 male third
11 5 female first
Any ideas? Thanks!
We could use match
library(dplyr)
df %>%
group_by(id) %>%
mutate(text = text[match("second", time, nomatch = 1)]) %>%
ungroup
-output
# A tibble: 11 × 3
id text time
<dbl> <chr> <chr>
1 1 male first
2 1 male second
3 1 male third
4 2 female first
5 2 female second
6 2 female third
7 3 female first
8 3 female second
9 4 male second
10 4 male third
11 5 female first
Or using coalesce
df %>%
group_by(id) %>%
mutate(text = coalesce(text[match("second", time)], text)) %>%
ungroup
-output
# A tibble: 11 × 3
id text time
<dbl> <chr> <chr>
1 1 male first
2 1 male second
3 1 male third
4 2 female first
5 2 female second
6 2 female third
7 3 female first
8 3 female second
9 4 male second
10 4 male third
11 5 female first