I have several data frames in R with the following structure
> df1
messy_col_name1 messy_group_name1
numeric data "group1"
... ...
numeric data "group1"
> df2
messy_col_name2 messy_group_name2
numeric data "group2"
... ...
numeric data "group2"
.
.
.
> dfN
messy_col_nameN messy_group_nameN
numeric data "groupN"
... ...
numeric data "groupN"
All of these data frames have 2 columns. The first column has real values, the second column is a string of the group name (factor).
I was wondering whether there is an efficient way to bind these data frames by row without relabelling the column names on each data frame. The final object should also be a data frame. The aim is to perform an ANOVA using aov(). The end result should appear like this:
> df.combined
col_name group
numeric_data "group1"
... ...
numeric_data "group1"
numeric_data "group2"
... ...
numeric_data "group2"
... ...
numeric_data "groupN"
... ...
numeric_data "groupN"
I was not successful using common functions like rbind(), rbind.fill() or bind_rows().
I examined the following posts however I was not able to solve this issue:
Many dataframes, different row lengths, similiar columns and dataframe titles, how to bind?
R: rbind data frames with a different column name
The following post came close:
How to rbind different data frames with different column names?
however the solution in this post is not efficient when there are many data frames.
Binding data frames by row does require that they have the same column names. Relabelling per data frame is likely as efficient as any other solution.
I would make a list of data frames; this allows the use of lapply
to rename the columns. Then you can use do.call(rbind)
or dplyr::bind_rows()
.
For example:
library(magrittr) # for the pipes
df.combined <- list(df1, df2, df3) %>%
lapply(., function(x) setNames(x, c("col_name", "group"))) %>%
do.call(rbind, .)
Or using dplyr
:
library(dplyr)
df.combined <- list(df1, df2, df3) %>%
lapply(., function(x) setNames(x, c("col_name", "group"))) %>%
bind_rows()
I would bet that there is also an elegant solution using one of the mapping functions in the purrr
package.