I want to group by column A
and then sum values in column C
for distinct values in columns B
and C
. Is it possible to do it inside summarise
clause?
I know that's possible with distinct()
function before aggregation. What about something like that:
Data:
df <- tibble(A = c(1,1,1,2,2), B = c('a','b','b','a','a'), C=c(5,10,10,15,15))
My try that doesn't work:
df %>%
group_by(A) %>%
summarise(sumC=sum(distinct(B,C) %>% select(C)))
Desired ouput:
A sumC
1 15
2 15
You could use duplicated
df %>%
group_by(A) %>%
summarise(sumC = sum(C[!duplicated(B)]))
## A tibble: 2 x 2
# A sumC
# <dbl> <dbl>
#1 1 15
#2 2 15
Or with distinct
df %>%
group_by(A) %>%
distinct(B, C) %>%
summarise(sumC = sum(C))
## A tibble: 2 x 2
# A sumC
# <dbl> <dbl>
#1 1 15
#2 2 15