Search code examples
rgreatest-n-per-group

Extract the unique rows with maximum value in another column in R dataframe


I have this data frame called mydf. There are duplicated samples in the Sample column. I want to extract the unique sample rows with the maximum total_reads and get the result.

mydf<-structure(list(Sample = c("AOGC-02-0188", "AOGC-02-0191", "AOGC-02-0191", 
"AOGC-02-0191", "AOGC-02-0194", "AOGC-02-0194", "AOGC-02-0194"
), total_reads = c(27392583, 19206920, 34462563, 53669483, 24731988, 
43419826, 68151814), Lane = c("4", "5", "4", "4;5", "5", "4", 
"4;5")), .Names = c("Sample", "total_reads", "Lane"), row.names = c("166", 
"169", "170", "171", "173", "174", "175"), class = "data.frame")

result

  Sample        total_reads  Lane
 AOGC-02-0188    27392583    4
 AOGC-02-0191    53669483  4;5
 AOGC-02-0194    68151814  4;5

Solution

  • You can aggregate and then merge,

    merge(aggregate(total_reads ~ Sample, mydf, max), mydf)
    #        Sample total_reads Lane
    #1 AOGC-02-0188    27392583    4
    #2 AOGC-02-0191    53669483  4;5
    #3 AOGC-02-0194    68151814  4;5