Search code examples
rdplyrsumtidyverse

Finding combined sums of pairs of rows in R


I have a dataframe with test results (rows being Players; and columns Q1 ... Q6 being the different questions). Now I want to find out which pair of players scored the highest sum in total:

# Generating sample data.
n = 6

set.seed(1986)

results_df = data.frame(Player = c("A", "B", "C", "D", "E", "F"), 
                     Q1 = sample(0:1, size = n, replace = TRUE), 
                     Q2 = sample(0:1, size = n, replace = TRUE),
                     Q3 = sample(0:1, size = n, replace = TRUE),
                     Q4 = sample(0:1, size = n, replace = TRUE),
                     Q5 = sample(0:1, size = n, replace = TRUE),
                     Q6 = sample(0:1, size = n , replace = TRUE))


head(results_df)

  Player Q1 Q2 Q3 Q4 Q5 Q6
1      A  1  0  1  0  0  0
2      B  1  1  0  0  0  0
3      C  0  1  0  1  0  1
4      D  0  1  1  0  1  1
5      E  1  1  1  1  1  1
6      F  1  0  0  1  0  1

The 1's and 0's are dummies for whether each player got their question right (1) or wrong (0). Now I would like to combine each pair of players to see how well they would have done it as a pair.

Does anyone know how I can transform the dataframe above to something looking like this below?

(Here I have just summed each combination of pairs by hand: A had 3 right, combined with B who had 3 questions right that A had wrong, would make a combination of 6, and so on...)

  Player  A  B  C  D  E  F
1      A  2  3  5  5  6  4
2      B  3  2  4  5  6  4
3      C  5  4  3  5  6  4
4      D  5  5  5  4  6  6
5      E  6  6  6  6  6  6
6      F  4  4  4  6  6  3

Solution

  • A base R option with outer

    > lst <- asplit(`row.names<-`(as.matrix(results_df[-1]), results_df$Player), 1)
    
    > outer(lst, lst, FUN = Vectorize(function(x, y) sum(x + y > 0)))
      A B C D E F
    A 2 3 5 5 6 4
    B 3 2 4 5 6 4
    C 5 4 3 5 6 4
    D 5 5 5 4 6 6
    E 6 6 6 6 6 6
    F 4 4 4 6 6 3