Search code examples
rgroupingsubset

How to select row with max value from a group of groups in R?


I have a dataframe with a number of groupings. I want to create a dataframe populated by rows that are the maximum of a given group of groups. Given the dataframe

group unit treatment value         etc
     1    A         w     8       apple
     1    A         x     9        pear
     1    A         y     7      orange
     1    A         z     2        pear
     1    B         w     4  strawberry
     1    B         x     3 dragonfruit
     1    B         y     6   raspberry
     1    B         z     5       apple
     1    C         w    32      banana
     1    C         x    27       peach
     1    C         y    15        plum
     1    C         z    28      orange
     2    A         w    12     apricot
     2    A         x    11  blackberry
     2    A         y    10      banana
     2    A         z     9  raspeberry
     2    B         w     1        plum
     2    B         x     2       lemon
     2    B         y     3  grapefruit
     2    B         z     4       apple
     2    C         w    51         fig
     2    C         x    47     avocado
     2    C         y    68  blackberry
     2    C         z    53 dragonfruit

for each group, for each unit, I would like to select the row with the highest value, such that I would end up with:

group unit treatment value        etc
     1    A         x     9       pear
     1    B         y     6  raspberry
     1    C         w    32     banana
     2    A         w    12    apricot
     2    B         z     4      apple
     2    C         y    68 blackberry

the etc column is just to highlight that I'd like to select the whole row.

I could write a series of nested loops, but there feels like there has to be something more elegant. Happy for base or tidyverse suggestions.


Solution

  • You can do as follows:

    library(dplyr)
    filter(dt, value==max(value), .by=group:unit)
    

    or (as @Limey suggests)

    library(dplyr)
    slice_max(dt, order_by= value, by=group:unit)
    

    or

    library(data.table)
    setDT(dt)[, .SD[value==max(value)], .(group, unit)]
    

    Output:

       group   unit treatment value        etc
       <int> <char>    <char> <int>     <char>
    1:     1      A         x     9       pear
    2:     1      B         y     6  raspberry
    3:     1      C         w    32     banana
    4:     2      A         w    12    apricot
    5:     2      B         z     4      apple
    6:     2      C         y    68 blackberry