Search code examples
rmergeapplycbind

c-bind from a list of data tables without recycling and with using list names as part of column names


This is a follow-up question to a very similar issue where I already received great help here but I'm failing to modify the solution to the new requirements.

I have a list of data frames with different row count each. I would like to use the name of each data frame to rename all (not just the 1st) columns in each data frame. Then I would like to add a column with a row number into each of the data frames in order to use merge to cbind all of them.

Again my small example to create a short list of data frames:

# Create dummy list of data frames
df1 <- data.frame(border=c(1,2,3), value=c(4,5,6))
df2 <- data.frame(border=as.factor(c("A","B")), value=c(3,5.7))
df_lst <- list(df1, df2)
names(df_lst) <- c("df1","df2")

This time the desired output should look like so:

  df1 border df1 value df2 border df2 value
1          1         4          A       3.0
2          2         5          B       5.7
3          3         6       <NA>        NA

I can describe how to get there in this small toy example in code by accessing the individual data frames but I'm hoping for a solution as elegant as the one suggested by @Maël in my previous question which applies the solution directly to the list of data frames:

# Append the list names hard coded to all column names
# but this should be dynamic within the list not using the data frames
names(df1) <- c("df1 border", "df1 value")
names(df2) <- c("df2 border", "df2 value")

# Add a column which can be used in merge
df1$row_id <- row.names(df1)
df2$row_id <- row.names(df2)

# Merge all dataframes adding rows w/ <NA> where required
# but this should use the list not the individual data frames
df <- merge(df1, df2, all.x=T, all.y=T)

Solution

  • We could do something like this: The main task is that cbind needs equal number of rows. In this example we use reframe(cur_data()[seq(3),], .by=id) %>% after binding to a dataframe. Here it is hardcoded with 3 but we can overcome easily by max_rows <- max(map_int(df_lst, nrow)). After this we need a little pivoting tricks:

    library(tidyverse)
    
    df_lst %>%
      map(~mutate_all(.x, as.character)) %>% 
      bind_rows(.id = 'id') %>% 
      reframe(cur_data()[seq(3),], .by=id) %>% 
      group_by(id) %>%
      mutate(row = row_number()) %>%
      pivot_wider(names_from = id, values_from = c(border, value)) %>%
      select(-row)
    
      border_df1 border_df2 value_df1 value_df2
      <chr>      <chr>      <chr>     <chr>    
    1 1          A          4         3        
    2 2          B          5         5.7      
    3 3          NA         6         NA