Search code examples
raggregateplyrgreatest-n-per-group

Aggregate a dataframe on a given column and display another column


I have a dataframe in R of the following form:

> head(data)
  Group Score Info
1     1     1    a
2     1     2    b
3     1     3    c
4     2     4    d
5     2     3    e
6     2     1    f

I would like to aggregate it following the Score column using the max function

> aggregate(data$Score, list(data$Group), max)

  Group.1         x
1       1         3
2       2         4

But I also would like to display the Info column associated to the maximum value of the Score column for each group. I have no idea how to do this. My desired output would be:

  Group.1         x        y
1       1         3        c
2       2         4        d

Any hint?


Solution

  • First, you split the data using split:

    split(z,z$Group)
    

    Than, for each chunk, select the row with max Score:

    lapply(split(z,z$Group),function(chunk) chunk[which.max(chunk$Score),])
    

    Finally reduce back to a data.frame do.calling rbind:

    do.call(rbind,lapply(split(z,z$Group),function(chunk) chunk[which.max(chunk$Score),]))
    

    Result:

      Group Score Info
    1     1     3    c
    2     2     4    d
    

    One line, no magic spells, fast, result has good names =)