Search code examples
rdataframesubsetsummarization

Summarize data frame based on condition


I have this kind of dataset (ID, V1, V2 are the 3 variables of my data frame):

ID V1 V2 
1  A  10
1  B  5
1  D  1
2  C  9
2  E  8

I would like a new data frame with, for each ID, the line that has the value max in V2. For the example, the result would be:

ID V1 V2 
1  A  10
2  C  9

Solution

  • This is sort of clumsy code, but it works....

    > mydf[with(mydf, ave(V2, ID, FUN = function(x) x == max(x))) == 1, ]
      ID V1 V2
    1  1  A 10
    4  2  C  9
    

    Less clumsy:

    do.call(rbind, 
            by(mydf, mydf$ID, 
               FUN = function(x) x[which.max(x$V2), ]))
    #   ID V1 V2
    # 1  1  A 10
    # 2  2  C  9