Search code examples
rtidyversepurrrreadxl

Read one worksheet from multiple excel files using purrr and readxl and add field


Suppose I have two excel files named packs.xlsx and each contains multiple sheets. I want to iteratively create a dataframe using only 1 sheet from each file, each named "summary". How can I go about this using Purrr and readxl while also adding a field which contains the filename?

I'm successful when I save the sheets as CSVs using the following code:

filenames <- list.files(pattern="packs*.*csv")
dat <- map_dfr(filenames, read_xlsx, sheet = "summary") %>% glimpse()

How would I go about adding a field to show which file a given row came from? Thanks for any insight that can be offered!


Solution

  • Supposing the two packs.xlsx files are in different subfolders:

    library(readxl)
    
    filenames <- list.files(pattern = "packs.xlsx", recursive = TRUE)
    df <- lapply(filenames, function(fn) {
      # get the sheet detail 
      xl <- read_excel(fn, sheet = "summary")    
      
      # add the filename as a field
      xl$filename <- fn            
    
      # function return                     
      xl
    })
    
    # if both summary sheets have the same format, you can combine them into one 
    fin <- do.call(rbind, df)