Search code examples
rloopshierarchyflatten

How to flatten dataframe variable from another dataframe in R


I have two data frames. One is used as a group definition (palette) with their respective pieces (colors). Some of them would be formed by combinations. On the other hand, I have a test data frame with different combinations of groups (non-strictly-palette), like color + palette. I would like to have a final data frame, with all non-strictly-palette with their respective pieces (colors).


# Definition dataframe ----

n = 3

Blues   = paste0('blue',  seq_len(n))
Greens  = paste0('green', seq_len(n))
Rainbow = c('red', 'Greens')

d_create = data.frame(
  group = c(
    rep(c('Blues', 'Greens'), each = n),
    rep('Rainbow', 2)
  ),
  piece = c(Blues, Greens, Rainbow)
)

d_create

#     group  piece
# 1   Blues  blue1
# 2   Blues  blue2
# 3   Blues  blue3
# 4  Greens green1
# 5  Greens green2
# 6  Greens green3
# 7 Rainbow    red
# 8 Rainbow Greens

# Test dataframe ----

Rainbow_plus = c('orange', 'Blues', 'Rainbow')

d_test = data.frame(
  group = c(
    rep('Blues',        length(Blues)),
    rep('Rainbow_plus', length(Rainbow_plus))
  ),
  piece = c(Blues, Rainbow_plus)
)

d_test

#          group   piece
# 1        Blues   blue1
# 2        Blues   blue2
# 3        Blues   blue3
# 4 Rainbow_plus  orange
# 5 Rainbow_plus   Blues
# 6 Rainbow_plus Rainbow

# Desired dataframe ----

d_desired = data.frame(
  group = c(
    rep('Blues', n),
    rep('Rainbow_plus', (2*n+2))
  ),
  piece = c(
    Blues,
    c('orange', Blues, 'red', Greens)
  )
)

d_desired

#           group  piece
# 1         Blues  blue1
# 2         Blues  blue2
# 3         Blues  blue3
# 4  Rainbow_plus orange
# 5  Rainbow_plus  blue1
# 6  Rainbow_plus  blue2
# 7  Rainbow_plus  blue3
# 8  Rainbow_plus    red
# 9  Rainbow_plus green1
# 10 Rainbow_plus green2
# 11 Rainbow_plus green3

EDITED:

I've changed dataframe examples. Now, d_create consist on:

  1. "Blues" one_depth_group
  2. "Greens" one_depth_group
  3. "Rainbow": (mix_group) simple+one_depth_group

While on d_test, we have:

  1. "Blues": one_depth_group
  2. "Rainbow_plus": simple+one_depth_group+mix_group

Note that red and orange are new colors not previous defined, so they remain unchanged.

I think I have to loop over d_test$piece, and check if each piece is present on d_create$group. If it is present, expand to their corresponding peaces. If not, remain unchanged.

lapply(d_test$piece, function(x) {
    check1 = x %in% d_create$group
    if (!check1) {
      x
    } else {
      lapply(d_test[d_test$group == x,]$piece, function(z){
        check2 = z %in% d_create$group
        if (!check2) {
          z
        } else {
          lapply(d_test[d_test$group == z, ]$piece, function(m){
            check3 = m %in% d_create$group
            if (!check3) {
              m
            } else {
              'infinite_loop'
            }
          })
        }
      })
    }

  })

But I don't want to write each possible loop, because in real world, it would be unpredictable for future flatten job. Maybe using while. Some help? thanks


Solution

  • You may code a while loop, in this case-

    • First change the column name of df_test (transaction table) with suffix 1 into the corresponding column name of d_create (master table) so that loop may be started and end point may also be defined.
    • In every iteration of the while loop, left_join your transaction table with master table so that you get an extra column in your transaction table along with respective level of hierarchy (first level in first hierarchy).
    • After that coalesce your first column of transaction table (resulted) with the newly created column.
    • The loop will end only when there are no further values to match from master table, i.e. new column if created will contain only same values and no extra value.

    I hope I have made the logic pretty clear.

    library(dplyr)
    
    #rename one column of d_test
    d_test <- d_test %>% rename(piece1 = piece)
    
    #actual while loop
    j = 1
    while(any(d_test[,(paste0('piece',j))] %in% d_create$group)){
      d_test %>% left_join(d_create, by = c('piece1' = 'group')) %>%
        rename(!!paste0('piece', j +1) := piece) %>%
        mutate(piece1 = coalesce(get(paste0('piece', j+1)), piece1)) -> d_test
      j = j +1
    }
    
    #desired output
    d_test %>% select(group, piece1)
    
              group piece1
    1         Blues  blue1
    2         Blues  blue2
    3         Blues  blue3
    4  Rainbow_plus orange
    5  Rainbow_plus  blue1
    6  Rainbow_plus  blue2
    7  Rainbow_plus  blue3
    8  Rainbow_plus    red
    9  Rainbow_plus green1
    10 Rainbow_plus green2
    11 Rainbow_plus green3