Search code examples
rdataframetidyrrbind

R append 2 data frames with different columns


I would like to append dfToAdd to df, where the first has missing columns. Important detail is that df has 2 types of columns. 1st set of columns are correlating with each other. e.g. group="A" means name="Group A" and color="Blue". There can't be a combination of A-Group A-Red. 2nd type of columns are correlating among themselves. animal="Dog" action="Bark" And I would like to add this second data frame with missing columns of the first type of columns. Those columns should be filled with combinations of the first type of columns like the following dfResult (order of rows don't matter):

df = data.frame(group = c("A", "A", "A", "B", "B", "B"),
                name = c("Group A", "Group A", "Group A", "Group B", "Group B", "Group B"),
                color = c("Blue", "Blue", "Blue", "Red", "Red", "Red"),
                animal = c("Dog", "Cat", "Mouse", "Dog", "Cat", "Mouse"),
                action = c("Bark", "Meow", "Squeak", "Bark", "Meow", "Squeak")
                )

dfToAdd = data.frame(animal = c("Lion", "Bird"), 
                     action = c("Roar", "Chirp"))

dfResult = data.frame(group = c("A", "A", "A", "B", "B", "B", "A", "A", "B", "B"),
                      name = c("Group A", "Group A", "Group A", "Group B", "Group B", "Group B", "Group A", "Group A", "Group B", "Group B"),
                      color = c("Blue", "Blue", "Blue", "Red", "Red", "Red", "Blue", "Blue", "Red", "Red"),
                      animal = c("Dog", "Cat", "Mouse", "Dog", "Cat", "Mouse", "Lion", "Bird", "Lion", "Bird"),
                      action = c("Bark", "Meow", "Squeak", "Bark", "Meow", "Squeak", "Roar", "Chirp", "Roar", "Chirp"))
> df
  group    name color animal action
1     A Group A  Blue    Dog   Bark
2     A Group A  Blue    Cat   Meow
3     A Group A  Blue  Mouse Squeak
4     B Group B   Red    Dog   Bark
5     B Group B   Red    Cat   Meow
6     B Group B   Red  Mouse Squeak
> dfToAdd
  animal action
1   Lion   Roar
2   Bird  Chirp
> dfResult
   group    name color animal action
1      A Group A  Blue    Dog   Bark
2      A Group A  Blue    Cat   Meow
3      A Group A  Blue  Mouse Squeak
4      B Group B   Red    Dog   Bark
5      B Group B   Red    Cat   Meow
6      B Group B   Red  Mouse Squeak
7      A Group A  Blue   Lion   Roar
8      A Group A  Blue   Bird  Chirp
9      B Group B   Red   Lion   Roar
10     B Group B   Red   Bird  Chirp

But the 1st type of columns (group, name, color) is not completely known. I am working with multiple grouping variables of an arbitrary number. You can imagine that there may or may not be be a description column="Group A is a good group" or date="2020.04.13". We only know for sure the columns of the second type: animal and action.


Solution

  • We could do this in a single %>% by sliceing the first row from 'df', select the columns that are not the ones in 'dfToAdd', bind that with the 'dfToAdd', then do the row bind with 'df' and use complete

    library(dplyr)
    library(tidyr)
    library(rlang)
    library(purrr)
    df %>%
           slice(1) %>%
           select(-names(dfToAdd)) %>%  
           uncount(nrow(dfToAdd))  %>%     
           bind_cols(dfToAdd) %>%
           bind_rows(df, .) %>% 
           complete(nesting(!!! syms(names(dfToAdd))), 
                 nesting(!!! syms(setdiff(names(.), names(dfToAdd)))))
    # A tibble: 10 x 5
    #   animal action group name    color
    # * <fct>  <fct>  <fct> <fct>   <fct>
    # 1 Cat    Meow   A     Group A Blue 
    # 2 Cat    Meow   B     Group B Red  
    # 3 Dog    Bark   A     Group A Blue 
    # 4 Dog    Bark   B     Group B Red  
    # 5 Mouse  Squeak A     Group A Blue 
    # 6 Mouse  Squeak B     Group B Red  
    # 7 Bird   Chirp  A     Group A Blue 
    # 8 Bird   Chirp  B     Group B Red  
    # 9 Lion   Roar   A     Group A Blue 
    #10 Lion   Roar   B     Group B Red