Search code examples
requalstop-n

Is there an R function to use with top_n when scores are equal


I'm doing correlations on school leaving scores and University scores. I use top_n to clean the school scores of retakes, but sometimes the poor candidate gets the same score on retake. How to eliminate these duplicates ?

  Studno  Math
     <int> <int>
 1 2105234    53
 2 2126745    10
 3 2126745    10
 4 2110897    41
 5 2344567    55
 6 2213467    63
 7 2314521    67
 8 2314521    40
 9 2123456    18
10 2123456    45

   duppymat1 %>% group_by ("Studno") %>% top_n(1,"Math")

This eliminates the two duplicates where scores are different, but how to code to eliminate one of the two that are equal ?


Solution

  • I tend to use row_number() == 1

    for example:

    require(tidyverse)
    
    df <- tribble(
      ~ Studno,  ~ Math,
    2105234,    53,
    2126745,    10,
    2126745,    10,
    2110897,    41,
    2344567,    55,
    2213467,    63,
    2314521,    67,
    2314521,    40,
    2123456,    18,
    2123456,    45
    )
    
    
    df %>% group_by (Studno) %>% top_n(1,Math)
    
    df %>% group_by(Studno) %>% filter(row_number(desc(Math))==1)
    
    

    gives

    # A tibble: 7 x 2
    # Groups:   Studno [7]
       Studno  Math
        <dbl> <dbl>
    1 2105234    53
    2 2126745    10
    3 2110897    41
    4 2344567    55
    5 2213467    63
    6 2314521    67
    7 2123456    45