Search code examples
rsumaggregate

Aggregate with respect to condition


I face the following problem. I have a dataset that I simplify this way

df <- tibble(ID_1=c("a","a", "a","a", "a", "b", "b", "b", "b"), ID_2 = c("a","b", "c","d", "e", "b", "a", "c", "d"), 
             other= c(1, "bu", 34, 3, 5, 6, "jud", 82, 9), exchange=c(1, 13, 34, 3, 5, 6, 8, 82, 9), 
             goods=c("wood","wood", "wood","wood", "wood", "steel", "steel", "steel", "steel"))
df

We have exchanges between multiple countries that have different IDs. I want to sum all the exchanges between country A and country B. Between country A and country C. etc. the goods exchanged. That is, in the end, I'd like to have

ID_1 // ID_2 // Sum_exchangeAB
ID_1 // ID_3 // Sum_exchangeBC
....
ID_n // ID_n // Sumexchangenn

How can I do that in Dplyr? I tried group_by (ID_1, ID_2) then aggregate. Or summarise.

I tried for example the following: dataset2 %>% group_by(ID_1, ID_2, exchange) %>% summarise(dataset2, sum_of_exchange=sum(exchange)) . But this gives me the sum of all the exchange (that is, it returns one value). Not the sum of exchange between two countries.

Thank you for your help, much appreciated!!


Solution

  • Is that what you are looking for?

    df %>%
      rowwise() %>%
      mutate(new_id = paste0(str_sort(c(ID_1, ID_2)), collapse = '-')) %>%
      group_by(new_id) %>%
      summarize(exchange = sum(exchange))
    

    which gives:

    # A tibble: 8 x 2
      new_id exchange
      <chr>     <dbl>
    1 a-a           1
    2 a-b          21
    3 a-c          34
    4 a-d           3
    5 a-e           5
    6 b-b           6
    7 b-c          82
    8 b-d           9
    

    UPDATE: it seems the TO wants to append the sum to the original data which can be done with:

    df %>%
      rowwise() %>%
      mutate(new_id = paste0(str_sort(c(ID_1, ID_2)), collapse = '-')) %>%
      group_by(new_id) %>%
      mutate(sum_exchange = sum(exchange)) %>%
      ungroup()
    

    which gives:

    # A tibble: 9 x 7
      ID_1  ID_2  other exchange goods new_id sum_exchange
      <chr> <chr> <chr>    <dbl> <chr> <chr>         <dbl>
    1 a     a     1            1 wood  a-a               1
    2 a     b     bu          13 wood  a-b              21
    3 a     c     34          34 wood  a-c              34
    4 a     d     3            3 wood  a-d               3
    5 a     e     5            5 wood  a-e               5
    6 b     b     6            6 steel b-b               6
    7 b     a     jud          8 steel a-b              21
    8 b     c     82          82 steel b-c              82
    9 b     d     9            9 steel b-d               9