I have a dataframe with a number of groupings. I want to create a dataframe populated by rows that are the maximum of a given group of groups. Given the dataframe
group unit treatment value etc
1 A w 8 apple
1 A x 9 pear
1 A y 7 orange
1 A z 2 pear
1 B w 4 strawberry
1 B x 3 dragonfruit
1 B y 6 raspberry
1 B z 5 apple
1 C w 32 banana
1 C x 27 peach
1 C y 15 plum
1 C z 28 orange
2 A w 12 apricot
2 A x 11 blackberry
2 A y 10 banana
2 A z 9 raspeberry
2 B w 1 plum
2 B x 2 lemon
2 B y 3 grapefruit
2 B z 4 apple
2 C w 51 fig
2 C x 47 avocado
2 C y 68 blackberry
2 C z 53 dragonfruit
for each group, for each unit, I would like to select the row with the highest value, such that I would end up with:
group unit treatment value etc
1 A x 9 pear
1 B y 6 raspberry
1 C w 32 banana
2 A w 12 apricot
2 B z 4 apple
2 C y 68 blackberry
the etc column is just to highlight that I'd like to select the whole row.
I could write a series of nested loops, but there feels like there has to be something more elegant. Happy for base
or tidyverse
suggestions.
You can do as follows:
library(dplyr)
filter(dt, value==max(value), .by=group:unit)
or (as @Limey suggests)
library(dplyr)
slice_max(dt, order_by= value, by=group:unit)
or
library(data.table)
setDT(dt)[, .SD[value==max(value)], .(group, unit)]
Output:
group unit treatment value etc
<int> <char> <char> <int> <char>
1: 1 A x 9 pear
2: 1 B y 6 raspberry
3: 1 C w 32 banana
4: 2 A w 12 apricot
5: 2 B z 4 apple
6: 2 C y 68 blackberry