Search code examples
rpivottidyversecombinationsself-join

Finding all the combinations of k elements among n columns in tidyverse


I am trying to compute stats for basketball lineups given the play-by-play data. In particular I have a dataframe which more or less looks like this

game_id action_id team_id event p1 p2 p3 p4 p5
G1 1 H 3PA A B C D E
G1 2 V REB A B C D E

I would like to make a sort of pivot_longer but instead of having a column with the names (p1:p5) and the values (A:E), I am looking for a function which allows me to find all combinations of k players ( 1 <= k <= 5 ). For instance, if k = 2, it would be something like:

game_id action_id team_id event p_comb1 p_comb2
G1 1 H 3PA A B
G1 1 H 3PA A C
G1 1 H 3PA A D
G1 1 H 3PA A E
G1 1 H 3PA B C
G1 1 H 3PA B D
G1 1 H 3PA B E
G1 1 H 3PA C D
G1 1 H 3PA C E
G1 1 H 3PA D E
G1 2 V REB A B
G1 2 V REB A C
G1 2 V REB A D
G1 2 V REB A E
G1 2 V REB B C
G1 2 V REB B D
G1 2 V REB B E
G1 2 V REB C D
G1 2 V REB C E
G1 2 V REB D E

In particular my requirements would be to:

  • be able to set k as a parameter of some function
  • avoid manual self joins (if it's not the only possible solution)
  • be able to do this operation after grouping, so I only look for combinations withing the same actions, and not all combinations in the dataset
  • do all within the same pipe, avoiding lapply and for cycles where possible

Does a function like even this exists?

I tried with self joining, which is in fact a possible solution, even if it doesn't look very elegant.


Solution

  • You could do:

    df %>%
      mutate(comb = apply(across(num_range('p', 1:5)), 1, function(x) as.data.frame(t(combn(x, 2))))) %>%
      unnest_longer(comb) %>%
      unpack(comb)