I have a dataframe like this:
# A tibble: 4 x 5
category month comment score email
<chr> <chr> <chr> <dbl> <chr>
1 neutro 2020-01 "" 8 xxx
2 promotor 2020-04 "ok" 9 xxx
3 promotor 2020-04 "very cool" 9 xxx
4 promotor 2020-05 "i really liked it" 9 xxx
Unfortunatelly, there was a survey, but with mistakes (client could answer more than one time!).
So now I'm trying to keep only the last answer, within each group.
When I use dplyr::distinct()
, he keeps the first occurence:
df %>%
distinct(category, month, score, email, .keep_all = T)
# A tibble: 3 x 5
category month comment score email
<chr> <chr> <chr> <dbl> <chr>
1 neutro 2020-01 "" 8 xxx
2 promotor 2020-04 "ok" 9 xxx
3 promotor 2020-05 "i really liked it" 9 xxx
But I would like to keep the last one, so this is my desired result:
# A tibble: 4 x 5
category month comment score email
<chr> <chr> <chr> <dbl> <chr>
1 neutro 2020-01 "" 8 xxx
2 promotor 2020-04 "very cool" 9 xxx
3 promotor 2020-05 "i really liked it" 9 xxx
Obs.: As I cited in the title, I can't arrange the grouped columns.
Could you group_by
?
library(dplyr)
df %>%
group_by(category, month, score, email) %>% # Also group_by(across(-comment)) would work with the example
slice_tail() %>%
ungroup()
Output:
# A tibble: 3 x 5
category month comment score email
<fct> <fct> <fct> <int> <fct>
1 neutro 2020-01 "" 8 xxx
2 promotor 2020-04 "very cool" 9 xxx
3 promotor 2020-05 "i really liked it" 9 xxx