Aggregating rows with shared values and simultaneously choosing value in a separate column to keep in aggregated row

hope you are well. I would like to a) add together values in Column A if the values in Column B are equal to one another AND the values in Column C are equal to one another, and simultaneously b) in the new summed row, only keep the value from Column D matching the maximal value of Column A from that summed group.

I think it is difficult to explain my query without an example.

Let us assume these are the relevant data:

df <- data.frame (A = c(10, 1, 4, 3, 7), 
                  B = c("a", "a", "b", "b", "b"), 
                  C = c(.5, .5, 2.5, 1.5, 2.5), 
                  D = c(54, 36, 94, 57, 49))

resulting in this dataframe:

   A B C   D
1 10 a 0.5 54
2  1 a 0.5 36
3  4 b 2.5 94
4  3 b 1.5 57
5  7 b 2.5 49

Notice that rows 1 and 2 are equivalent in B and C, so they should be summed. But row 1 has the greater value in A of the two, so 54 should be kept instead of 36. It's similar with rows 3 and 5. The end result should be:

A   B   C   D
11  a   .5  54
3   b   1.5 57
11  b   2.5 49

I am partway there. I have found some code that does part a). Either of these does the trick:

aggregate(A ~ B + C, df, sum)

library(data.table)
setDT(df)[, .(summedvar = sum(A)), by = .(A, B)]

However, these approaches delete Column D, unsurprisingly. I am curious if anyone has any ideas about how to incorporate part b). Maybe I need to do multiple steps? Or maybe I'm going about this the wrong way? I’m deeply grateful for any advice.

Solution

You can use which.max to get the index of highest value in A and get the corresponding value of D.

Using dplyr you may do -

library(dplyr)

df %>%
  group_by(B, C) %>%
  summarise(D = D[which.max(A)], 
            A = sum(A), .groups = "drop") %>%
  select(A, B, C, D)

#     A B         C     D
#  <dbl> <chr> <dbl> <dbl>
#1    11 a       0.5    54
#2     3 b       1.5    57
#3    11 b       2.5    49

Similarly in data.table -

library(data.table)

setDT(df)
df[, .(A = sum(A), D = D[which.max(A)]), .(B, C)]