I'm doing correlations on school leaving scores and University scores. I use top_n to clean the school scores of retakes, but sometimes the poor candidate gets the same score on retake. How to eliminate these duplicates ?
Studno Math
<int> <int>
1 2105234 53
2 2126745 10
3 2126745 10
4 2110897 41
5 2344567 55
6 2213467 63
7 2314521 67
8 2314521 40
9 2123456 18
10 2123456 45
duppymat1 %>% group_by ("Studno") %>% top_n(1,"Math")
This eliminates the two duplicates where scores are different, but how to code to eliminate one of the two that are equal ?
I tend to use row_number() == 1
for example:
require(tidyverse)
df <- tribble(
~ Studno, ~ Math,
2105234, 53,
2126745, 10,
2126745, 10,
2110897, 41,
2344567, 55,
2213467, 63,
2314521, 67,
2314521, 40,
2123456, 18,
2123456, 45
)
df %>% group_by (Studno) %>% top_n(1,Math)
df %>% group_by(Studno) %>% filter(row_number(desc(Math))==1)
gives
# A tibble: 7 x 2
# Groups: Studno [7]
Studno Math
<dbl> <dbl>
1 2105234 53
2 2126745 10
3 2110897 41
4 2344567 55
5 2213467 63
6 2314521 67
7 2123456 45