I have two data frames. One is used as a group definition (palette) with their respective pieces (colors). Some of them would be formed by combinations. On the other hand, I have a test data frame with different combinations of groups (non-strictly-palette), like color + palette. I would like to have a final data frame, with all non-strictly-palette with their respective pieces (colors).
# Definition dataframe ----
n = 3
Blues = paste0('blue', seq_len(n))
Greens = paste0('green', seq_len(n))
Rainbow = c('red', 'Greens')
d_create = data.frame(
group = c(
rep(c('Blues', 'Greens'), each = n),
rep('Rainbow', 2)
),
piece = c(Blues, Greens, Rainbow)
)
d_create
# group piece
# 1 Blues blue1
# 2 Blues blue2
# 3 Blues blue3
# 4 Greens green1
# 5 Greens green2
# 6 Greens green3
# 7 Rainbow red
# 8 Rainbow Greens
# Test dataframe ----
Rainbow_plus = c('orange', 'Blues', 'Rainbow')
d_test = data.frame(
group = c(
rep('Blues', length(Blues)),
rep('Rainbow_plus', length(Rainbow_plus))
),
piece = c(Blues, Rainbow_plus)
)
d_test
# group piece
# 1 Blues blue1
# 2 Blues blue2
# 3 Blues blue3
# 4 Rainbow_plus orange
# 5 Rainbow_plus Blues
# 6 Rainbow_plus Rainbow
# Desired dataframe ----
d_desired = data.frame(
group = c(
rep('Blues', n),
rep('Rainbow_plus', (2*n+2))
),
piece = c(
Blues,
c('orange', Blues, 'red', Greens)
)
)
d_desired
# group piece
# 1 Blues blue1
# 2 Blues blue2
# 3 Blues blue3
# 4 Rainbow_plus orange
# 5 Rainbow_plus blue1
# 6 Rainbow_plus blue2
# 7 Rainbow_plus blue3
# 8 Rainbow_plus red
# 9 Rainbow_plus green1
# 10 Rainbow_plus green2
# 11 Rainbow_plus green3
EDITED:
I've changed dataframe examples. Now, d_create consist on:
While on d_test, we have:
Note that red
and orange
are new colors not previous defined, so they remain unchanged.
I think I have to loop over d_test$piece
, and check if each piece is present on d_create$group
. If it is present, expand to their corresponding peaces. If not, remain unchanged.
lapply(d_test$piece, function(x) {
check1 = x %in% d_create$group
if (!check1) {
x
} else {
lapply(d_test[d_test$group == x,]$piece, function(z){
check2 = z %in% d_create$group
if (!check2) {
z
} else {
lapply(d_test[d_test$group == z, ]$piece, function(m){
check3 = m %in% d_create$group
if (!check3) {
m
} else {
'infinite_loop'
}
})
}
})
}
})
But I don't want to write each possible loop, because in real world, it would be unpredictable for future flatten job. Maybe using while
. Some help? thanks
You may code a while loop, in this case-
df_test
(transaction table) with suffix 1
into the corresponding column name of d_create
(master table) so that loop may be started and end point may also be defined.left_join
your transaction table with master table so that you get an extra column in your transaction table along with respective level of hierarchy (first level in first hierarchy).coalesce
your first column of transaction table (resulted) with the newly created column.I hope I have made the logic pretty clear.
library(dplyr)
#rename one column of d_test
d_test <- d_test %>% rename(piece1 = piece)
#actual while loop
j = 1
while(any(d_test[,(paste0('piece',j))] %in% d_create$group)){
d_test %>% left_join(d_create, by = c('piece1' = 'group')) %>%
rename(!!paste0('piece', j +1) := piece) %>%
mutate(piece1 = coalesce(get(paste0('piece', j+1)), piece1)) -> d_test
j = j +1
}
#desired output
d_test %>% select(group, piece1)
group piece1
1 Blues blue1
2 Blues blue2
3 Blues blue3
4 Rainbow_plus orange
5 Rainbow_plus blue1
6 Rainbow_plus blue2
7 Rainbow_plus blue3
8 Rainbow_plus red
9 Rainbow_plus green1
10 Rainbow_plus green2
11 Rainbow_plus green3