Search code examples
rdataframeduplicatesmedian

remove specific duplicate rows based on median


I currently have a data frame that looks like this:

        result 1    result 2    result 3    median 
item 1    8             7           6         7 
item 5    1             2           3         2 
item 1    6             5           4         5
item 5    3             4           5         4 

I want to remove the duplicates based on the median, where I want to keep the duplicate entry with the higher median. Problem with this is that the rownames (item 1, etc) are not their own columns, so it's not accessible with $ operations.

How can I accomplish this? Thanks in advance.


Solution

  • You can simply order decreasing and remove the duplicates, i.e.

    df <- df[order(df$median, decreasing = TRUE),]
    df[!duplicated(df$row),]
    

    which gives,

        row result1 result2 result3 median
    1 item1       8       7       6      7
    4 item5       3       4       5      4