Search code examples
rdplyrtidyr

How can I convert two summarized columns to two columns with a binary response in R?


I have a dataframe in R with results of sports matches. Each row represents one match between two players with columns for the date of the match, the player names, and the scores of each player.

event.date home.name away.name home.score away.score
2024-06-15 A B 7 5
2024-06-14 A C 7 2
2024-06-13 B C 3 7
df <- tibble(
  event.date=c('2024-06-15','2024-06-14','2024-06-13'),
  home.name=c('A','B','C'),
  away.name=c('B','C','C'),
  home.score=c(7,7,3),
  away.score=c(5,2,7)
)

I want to turn this "summarized" table into a table listing out the total score for each player in each match by turning the home.score and away.score columns into binary responses. For example, for the match between A and B on 2024-06-15, there should be 12 rows total: 7 with home.score = 1 and away.score = 0, and 5 with home.score = 0 and away.score = 1.

event.date home.name away.name home.score away.score
2024-06-15 A B 1 0
2024-06-15 A B 1 0
2024-06-15 A B 1 0
2024-06-15 A B 1 0
2024-06-15 A B 1 0
2024-06-15 A B 1 0
2024-06-15 A B 1 0
2024-06-15 A B 0 1
2024-06-15 A B 0 1
2024-06-15 A B 0 1
2024-06-15 A B 0 1
2024-06-15 A B 0 1

For the example above, the final table generated from df should have 31 rows - one for each "point" in the scores of all the matches.

I thought I could use tidyr::uncount() to increase the rows of my table by the values of home.score and away.score, but the weights argument only takes one column as the parameter - I can't use it across both home.score and away.score.

> uncount(df,weights=home.score)

# A tibble: 17 × 4
   event.date home.name away.name away.score
   <chr>      <chr>     <chr>          <dbl>
 1 2024-06-15 A         B                  5
 2 2024-06-15 A         B                  5
 3 2024-06-15 A         B                  5
 4 2024-06-15 A         B                  5
 5 2024-06-15 A         B                  5
 6 2024-06-15 A         B                  5
 7 2024-06-15 A         B                  5
 8 2024-06-14 B         C                  2
 9 2024-06-14 B         C                  2
10 2024-06-14 B         C                  2
11 2024-06-14 B         C                  2
12 2024-06-14 B         C                  2
13 2024-06-14 B         C                  2
14 2024-06-14 B         C                  2
15 2024-06-13 C         C                  7
16 2024-06-13 C         C                  7
17 2024-06-13 C         C                  7

Solution

  • reframe seems like a good candidate to do this. My first attempt was:

    df |> reframe(
      home.score = rep(c(1, 0), c(home.score, away.score)),
      away.score = rep(c(0, 1), c(home.score, away.score)),
      .by = c(event.date, home.name, away.name)
    )
    

    Which I think is nice and readable code. But unfortunately, the home.score value has already been modified when we want to calculate the away.score, so we have to use this more confusing version, where we use the sum of the home score (i.e. the sum of the vector of 0s and 1s):

    df |> reframe(
      home.score = rep(c(1, 0), c(home.score, away.score)),
      away.score = rep(c(0, 1), c(sum(home.score), away.score)),
      .by = c(event.date, home.name, away.name)
    )
    

    Which gives:

    # A tibble: 31 × 5
       event.date home.name away.name home.score away.score
       <chr>      <chr>     <chr>          <dbl>      <dbl>
     1 2024-06-15 A         B                  1          0
     2 2024-06-15 A         B                  1          0
     3 2024-06-15 A         B                  1          0
     4 2024-06-15 A         B                  1          0
     5 2024-06-15 A         B                  1          0
     6 2024-06-15 A         B                  1          0
     7 2024-06-15 A         B                  1          0
     8 2024-06-15 A         B                  0          1
     9 2024-06-15 A         B                  0          1
    10 2024-06-15 A         B                  0          1
    # ℹ 21 more rows
    # ℹ Use `print(n = ...)` to see more rows