Search code examples
rdata.tabletidyversepurrrfurrr

How to generate data frame from pairwise combinations of levels


I want to generate a dataframe from a combination of factor levels with a fixed level to be shared. I have a working code shown below but I want to generalize it so that it can work for any arbitrary number of levels by simply having as input the following: the dataframe df, the variable to split over var1, the level to be shared A, and the name of the new variable strat. I want to be able to use this function with pipes, to allow additional operations thereafter. Any help would be much appreciated.

Here is my attempt:

var1 <- c("A", "B", "C", "A", "B", "C", "A", "B", "C", "B")
var2 <- seq(2000, 2009, 1)
var3 <- sample(1:10, 10, replace=T)
var4 <- sample(1:10, 10, replace=T)
df <- data.frame(var1, var2, var3, var4)


df2<-df %>% group_split(var1)   

dfB<-rbind(df2[[1]], df2[[2]]) %>% transform(.,
strat = "BA")

dfC<-rbind(df2[[1]], df2[[3]]) %>% transform(.,
strat = "CA")

df3<-rbind(dfB, dfC)

df3
   var1 var2 var3 var4 strat
1     A 2000    8    5    BA
2     A 2003    5    7    BA
3     A 2006    1    6    BA
4     B 2001    3    6    BA
5     B 2004    6    9    BA
6     B 2007    8   10    BA
7     B 2009    5    5    BA
8     A 2000    8    5    CA
9     A 2003    5    7    CA
10    A 2006    1    6    CA
11    C 2002    9    5    CA
12    C 2005    3    5    CA
13    C 2008    5    1    CA

Solution

  • Here is another way. We divide the "A" group differently and group_split based on var1 and now add a new column strat by pasting the first value of var1 with "A".

    library(dplyr)
    
    A_df <- df %>% filter(var1 == "A")
    
    df %>%
       filter(var1 != "A") %>%
       group_split(var1) %>%
       purrr::map_df(. %>% bind_rows(A_df) %>% mutate(strat = paste0(first(var1), "A")))
    
    
    #  var1   var2  var3  var4 strat
    #  <fct> <dbl> <int> <int> <chr>
    # 1 B      2001     5     5 BA   
    # 2 B      2004    10    10 BA   
    # 3 B      2007     5     4 BA   
    # 4 B      2009     9     6 BA   
    # 5 A      2000     5     9 BA   
    # 6 A      2003     6     2 BA   
    # 7 A      2006     9     1 BA   
    # 8 C      2002    10     5 CA   
    # 9 C      2005     7     9 CA   
    #10 C      2008     5     3 CA   
    #11 A      2000     5     9 CA   
    #12 A      2003     6     2 CA   
    #13 A      2006     9     1 CA