Search code examples
rxlsxrbind

How to read and rbind all .xlsx files in a folder efficiently using read_excel


I am new to R and need to create one dataframe from 80 .xlsx files that mostly share the same columns and are all in the same folder. I want to bind all these files efficiently in a manner that would work if I added or removed files from the folder later. I want to do this without converting the files to .csv, unless someone can show me how to that efficiently for large numbers of files within R itself.

I've previously been reading files individually using the read_excel function from the readxl package. After, I would use rbind to bind them. This was fine for 10 files, but not 80! I've experimented with many solutions offered online however none of these seem to work, largely because they are using functions other than read_excel or formats other than .xlsx. I haven't kept track of many of my failed attempts, so cannot offer code other than one alternate method I tried to adapt to read_excel from the read_csv function.

#Method 1
library(readxl)
library(purr)
library(dplyr)
library(tidyverse)
file.list <- list.files(pattern='*.xlsx')
alldata <- file.list %>%
map(read_excel) %>%
reduce(rbind)

#Output
New names:
* `` -> ...2
Error in rbind(deparse.level, ...) : 
numbers of columns of arguments do not match

Any code on how to do this would be greatly appreciated. Sorry if anything is wrong about this post, it is my first one.

UPDATE: Using the changes suggested by the answers, I'm now using the code:

file.list <- list.files(pattern='*.xlsx')
alldata <- file.list %>%
map_dfr(read_excel) %>%
reduce(bind_rows)

This output now is as follows:

New names:
* `` -> ...2
Error: Column `10.Alert.alone` can't be converted from numeric to character

This happens regardless of which type of bind() function I use in the reduce() slot. If anyone can help with this, please let me know!


Solution

  • You're on the right track here. But you need to use map_dfr instead of plain-vanilla map. map_dfr outputs a data frame (or actually tibble) for each iteration, and combines them via bind_rows.

    This should work:

    library(readxl)
    library(tidyverse)
    file.list <- list.files(pattern='*.xlsx')
    alldata <- file.list %>%
      map_dfr(~read_excel(.x))
    

    Note that this assumes your files all have consistent column names and data types. If they don't, you may have to do some cleaning. (One trick I've used in complex cases is to add a %>% mutate_all(as.character) to the read_excel command inside the map function. That will turn everything into characters, and then you can convert the data types from there.)