I'm close but don't have the syntax correct. I'm trying to select all columns of a data table based on selection of unique combinations of two variables (columns) based on the maximum value of a third. MWE of progress thus far. Thx. J
library(dplyr)
dt1 <- tibble (var1 = c("num1", "num2", "num3", "num4", "num5"),
var2 = rep("A", 5),
var3 = c(rep("B", 2), rep("C", 3)),
var4 = c(5, 10, 3, 7, 19))
dt1 %>% distinct(var2, var3, max(var4), .keep_all = TRUE)
# A tibble: 2 x 5
var1 var2 var3 var4 `max(var4)`
<chr> <chr> <chr> <dbl> <dbl>
1 num1 A B 5 19
2 num3 A C 3 19
which is close, but I want the row where the value of var4 is the max value, within the unique combination of var2 and var3. I'm attempting to get:
# A tibble: 2 x 5
var1 var2 var3 var4 `max(var4)`
<chr> <chr> <chr> <dbl> <dbl>
1 num2 A B 5 10
2 num5 A C 3 19
Do I need a formula for the third argument of the distinct function?
We can add an arrange
statement before the distinct
library(dplyr)
dt1 %>%
arrange(var2, var3, desc(var4)) %>%
distinct(var2, var3, .keep_all = TRUE)
-output
# A tibble: 2 x 4
var1 var2 var3 var4
<chr> <chr> <chr> <dbl>
1 num2 A B 10
2 num5 A C 19
Or another option is slice_max
dt1 %>%
group_by(var2, var3) %>%
mutate(var4new = first(var4)) %>%
slice_max(order_by= var4, n = 1) %>%
ungroup
-output
# A tibble: 2 x 5
var1 var2 var3 var4 var4new
<chr> <chr> <chr> <dbl> <dbl>
1 num2 A B 10 5
2 num5 A C 19 3