Search code examples
rcombinations

Create all combinations of two variables from two dataframes while keeping all other variables in R


I have minimaldata in two dataframes and am looking for all combinations of two variables, each in one of the dataframes. This can be achieved by expand.grid(). Is there a function to preserve all other variables from the two dataframes?

df1 <- data.frame(var1 = c("A", "B", "C"), var2 = c(1,2,3))
df2 <- data.frame(var1 = c("D", "E", "F"), var2 = c(4,5,6))
expand.grid(df1$var1, df2$var1)

  Var1 Var2
1    A    D
2    B    D
3    C    D
4    A    E
5    B    E
6    C    E
7    A    F
8    B    F
9    C    F

Expected result are all combinations and all other variables, perhaps with a suffix.

  Var1.x Var1.y var2.x var2.y
1    A    D    1     4
2    B    D    2     4
3    C    D    3     4
4    A    E    1     5 
5    B    E    2     5
6    C    E    3     5
7    A    F    1     6
8    B    F    2     6
9    C    F    3     6

Solution

  • With dplyr, you could use full_join(x, y, by = character()) to perform a cross-join, generating all combinations of x and y.

    library(dplyr)
    
    full_join(df1, df2, by = character())
    
    #   var1.x var2.x var1.y var2.y
    # 1      A      1      D      4
    # 2      A      1      E      5
    # 3      A      1      F      6
    # 4      B      2      D      4
    # 5      B      2      E      5
    # 6      B      2      F      6
    # 7      C      3      D      4
    # 8      C      3      E      5
    # 9      C      3      F      6
    

    An alternative is expand_grid() or crossing() from tidyr to create a tibble from all combinations of inputs.

    library(tidyr)
    
    crossing(df1, df2, .name_repair = ~ paste0(.x, rep(c('.x', '.y'), each = 2)))
    

    crossing() is a wrapper around expand_grid() that de-duplicates and sorts its inputs.