I have an R script that creates a data frame with 61 columns. The data.frame is made by reading a bunch of csv files into a list of data.frames, then merging the list such that commonly named columns in each data.frame in the list populate the same column in the resulting data.frame.
Some of the columns that should be combined have been inconsistently named in the csv files (eg date.received vs received.on.date vs date.sample.received), and I was wondering what the best approach to combining them would be.
I had a couple ideas:
is the second approach possible (and how?) or is there a better way?
The second approach is possible and it goes easy with rbind_all
from dplyr
package. Here is how:
First of all, if you have some information about the pattern of the names of columns that should be stacked together, I suggest you to try to fix it before stacking, like:
colnames_synonymous <- c("date.received", "received.on.date", "date.sample.received")
list_of_dfs <- lapply(list_of_dfs, function(df) {
names(df)[names(df) %in% colnames_synonymous] <- "date_received"
return(df)
})
Now you are good to go:
dplyr::rbind_all(list_of_dfs)
Maybe you will have to do some adjustments before getting all columns stacked right, but now, all you need to do is changing the lapply function to do so. I find this way easier than make some columns transformations after rbinding.