Search code examples
rmergebind

Row bind many data frames with consistent structure but different column names


I have several data frames in R with the following structure

> df1
 messy_col_name1  messy_group_name1
 numeric data     "group1"
 ...              ...
 numeric data     "group1"

> df2
 messy_col_name2  messy_group_name2
 numeric data     "group2"
 ...              ...
 numeric data     "group2"
 .
 .
 .
> dfN
 messy_col_nameN  messy_group_nameN
 numeric data     "groupN"
 ...              ...
 numeric data     "groupN"

All of these data frames have 2 columns. The first column has real values, the second column is a string of the group name (factor).

I was wondering whether there is an efficient way to bind these data frames by row without relabelling the column names on each data frame. The final object should also be a data frame. The aim is to perform an ANOVA using aov(). The end result should appear like this:

> df.combined
 col_name      group
 numeric_data  "group1"
 ...           ...
 numeric_data  "group1"
 numeric_data  "group2"
 ...           ...
 numeric_data  "group2"
 ...           ...
 numeric_data  "groupN"
 ...           ...
 numeric_data  "groupN"

I was not successful using common functions like rbind(), rbind.fill() or bind_rows().

I examined the following posts however I was not able to solve this issue:

Many dataframes, different row lengths, similiar columns and dataframe titles, how to bind?

R: rbind data frames with a different column name

The following post came close:

How to rbind different data frames with different column names?

however the solution in this post is not efficient when there are many data frames.


Solution

  • Binding data frames by row does require that they have the same column names. Relabelling per data frame is likely as efficient as any other solution.

    I would make a list of data frames; this allows the use of lapply to rename the columns. Then you can use do.call(rbind) or dplyr::bind_rows().

    For example:

    library(magrittr) # for the pipes
    df.combined <- list(df1, df2, df3) %>% 
      lapply(., function(x) setNames(x, c("col_name", "group"))) %>% 
      do.call(rbind, .)
    

    Or using dplyr:

    library(dplyr)
    df.combined <- list(df1, df2, df3) %>% 
      lapply(., function(x) setNames(x, c("col_name", "group"))) %>% 
      bind_rows()
    

    I would bet that there is also an elegant solution using one of the mapping functions in the purrr package.