Search code examples
rdplyr

How can I find unique combinations of two columns in R and sum two related columns?


I have a table of results from sports matches where the players are stored in columns p1 and p2, and their scores in that match are stored in columns p1.score and p2.score.

p1 p2 p1.score p2.score
Trevor Patrick 5 2
Patrick Trevor 3 7
library(dplyr)
matches <- tibble(p1=c("Trevor","Patrick"),
                  p2=c("Patrick","Trevor"),
                  p1.score=c(5,3),
                  p2.score=c(2,7))

I want to group by the unique combinations in p1 and p2 and then add the scores together in the correct order to get the cumulative score in all matches between p1 and p2. For the example above, the results table should have one row for the matches between Trevor and Patrick, with a cumulative score of 12 for Trevor and 5 for Patrick.

My first thought was to use pmin and pmax within dplyr::group_by to get the unique combinations of p1 and p2, then use summarize to add up p1.score and p2.score.

matches %>%
  group_by(a = pmin(p1,p2),
           b = pmax(p1,p2)) %>%
  summarize(a.wins = sum(p1.score),
            b.wins = sum(p2.score))

# A tibble: 1 × 4
# Groups:   a [1]
  a       b      a.wins b.wins
  <chr>   <chr>   <dbl>  <dbl>
1 Patrick Trevor      8      9

But this doesn't sum correctly because the summarization isn't taking into account the order defined by a and b.


Solution

  • df %>%
       rename_with(~str_c(.x, ".name"),matches('^[^.]+$')) %>%
       rowid_to_column() %>% 
       pivot_longer(-rowid, names_to = c('grp', '.value'), names_sep = '[.]') %>% 
       mutate(grp = toString(sort(name)), .by = rowid) %>% 
       summarise(score = sum(score), .by = c(grp, name)) %>% 
       mutate(grp1 = row_number(), .by = grp) %>% 
       pivot_wider(names_from = grp1, values_from = c(name, score))
    
    # A tibble: 1 × 5
      grp             name_1 name_2  score_1 score_2
      <chr>           <chr>  <chr>     <int>   <int>
    1 Patrick, Trevor Trevor Patrick      12       5