I face the following problem. I have a dataset that I simplify this way
df <- tibble(ID_1=c("a","a", "a","a", "a", "b", "b", "b", "b"), ID_2 = c("a","b", "c","d", "e", "b", "a", "c", "d"),
other= c(1, "bu", 34, 3, 5, 6, "jud", 82, 9), exchange=c(1, 13, 34, 3, 5, 6, 8, 82, 9),
goods=c("wood","wood", "wood","wood", "wood", "steel", "steel", "steel", "steel"))
df
We have exchanges between multiple countries that have different IDs. I want to sum all the exchanges between country A and country B. Between country A and country C. etc. the goods exchanged. That is, in the end, I'd like to have
ID_1 // ID_2 // Sum_exchangeAB
ID_1 // ID_3 // Sum_exchangeBC
....
ID_n // ID_n // Sumexchangenn
How can I do that in Dplyr? I tried group_by (ID_1, ID_2) then aggregate. Or summarise.
I tried for example the following: dataset2 %>% group_by(ID_1, ID_2, exchange) %>% summarise(dataset2, sum_of_exchange=sum(exchange)) . But this gives me the sum of all the exchange (that is, it returns one value). Not the sum of exchange between two countries.
Thank you for your help, much appreciated!!
Is that what you are looking for?
df %>%
rowwise() %>%
mutate(new_id = paste0(str_sort(c(ID_1, ID_2)), collapse = '-')) %>%
group_by(new_id) %>%
summarize(exchange = sum(exchange))
which gives:
# A tibble: 8 x 2
new_id exchange
<chr> <dbl>
1 a-a 1
2 a-b 21
3 a-c 34
4 a-d 3
5 a-e 5
6 b-b 6
7 b-c 82
8 b-d 9
UPDATE: it seems the TO wants to append the sum to the original data which can be done with:
df %>%
rowwise() %>%
mutate(new_id = paste0(str_sort(c(ID_1, ID_2)), collapse = '-')) %>%
group_by(new_id) %>%
mutate(sum_exchange = sum(exchange)) %>%
ungroup()
which gives:
# A tibble: 9 x 7
ID_1 ID_2 other exchange goods new_id sum_exchange
<chr> <chr> <chr> <dbl> <chr> <chr> <dbl>
1 a a 1 1 wood a-a 1
2 a b bu 13 wood a-b 21
3 a c 34 34 wood a-c 34
4 a d 3 3 wood a-d 3
5 a e 5 5 wood a-e 5
6 b b 6 6 steel b-b 6
7 b a jud 8 steel a-b 21
8 b c 82 82 steel b-c 82
9 b d 9 9 steel b-d 9