I want to generate a dataframe from a combination of factor levels with a fixed level to be shared. I have a working code shown below but I want to generalize it so that it can work for any arbitrary number of levels by simply having as input the following: the dataframe df
, the variable to split over var1
, the level to be shared A
, and the name of the new variable strat
. I want to be able to use this function with pipes, to allow additional operations thereafter. Any help would be much appreciated.
Here is my attempt:
var1 <- c("A", "B", "C", "A", "B", "C", "A", "B", "C", "B")
var2 <- seq(2000, 2009, 1)
var3 <- sample(1:10, 10, replace=T)
var4 <- sample(1:10, 10, replace=T)
df <- data.frame(var1, var2, var3, var4)
df2<-df %>% group_split(var1)
dfB<-rbind(df2[[1]], df2[[2]]) %>% transform(.,
strat = "BA")
dfC<-rbind(df2[[1]], df2[[3]]) %>% transform(.,
strat = "CA")
df3<-rbind(dfB, dfC)
df3
var1 var2 var3 var4 strat
1 A 2000 8 5 BA
2 A 2003 5 7 BA
3 A 2006 1 6 BA
4 B 2001 3 6 BA
5 B 2004 6 9 BA
6 B 2007 8 10 BA
7 B 2009 5 5 BA
8 A 2000 8 5 CA
9 A 2003 5 7 CA
10 A 2006 1 6 CA
11 C 2002 9 5 CA
12 C 2005 3 5 CA
13 C 2008 5 1 CA
Here is another way. We divide the "A"
group differently and group_split
based on var1
and now add a new column strat
by pasting the first
value of var1
with "A"
.
library(dplyr)
A_df <- df %>% filter(var1 == "A")
df %>%
filter(var1 != "A") %>%
group_split(var1) %>%
purrr::map_df(. %>% bind_rows(A_df) %>% mutate(strat = paste0(first(var1), "A")))
# var1 var2 var3 var4 strat
# <fct> <dbl> <int> <int> <chr>
# 1 B 2001 5 5 BA
# 2 B 2004 10 10 BA
# 3 B 2007 5 4 BA
# 4 B 2009 9 6 BA
# 5 A 2000 5 9 BA
# 6 A 2003 6 2 BA
# 7 A 2006 9 1 BA
# 8 C 2002 10 5 CA
# 9 C 2005 7 9 CA
#10 C 2008 5 3 CA
#11 A 2000 5 9 CA
#12 A 2003 6 2 CA
#13 A 2006 9 1 CA