Search code examples
rxlsxpurrrreadxlopenxlsx

xlsx R: looping through list of files to check all sheet names; create a blank sheet if it does not exist


I am attempting to write a loop through a file of Excel workbooks, checking the names of all sheets to ensure that each excel workbook has three sheet names in it ('sheet1', 'sheet2', 'sheet3' - in any order). If one of those sheet names is missing, I want to create the missing sheet (blank) in the workbook missing the sheet.

  1. I am currently stuck on making sure the workbooks' sheet names match (in the list 'result') - I tried the script below but 'correct_sheet' returns empty
  2. For adding sheets, can addWorksheet (openxlsx) be used in a loop? Would map (purr) be better than a for loop / can map be applied to workbooks in a list?

I've never worked across multiple files like this and I'm open to any and all suggestions / approaches!

library(tidyverse)
library(readxl)
'%notin%' <- Negate('%in%')


all_files <- list.files(path="./path/to/files'", pattern = "*.xlsx", full.names = T)

#write files to a list to see the sheet names
result <- lapply(all_files, function(x) {
  all_sheets <- excel_sheets(x)  
}

#filter by which workbooks match criteria
correct_sheet <- lapply(result, FUN = function(x) {
  correct_sheet <- intersect(result, c('sheet1', 'sheet2', 'sheet3') )
})

#Add worksheet in a loop
for (result %notin% correct_sheet){

  
}



Solution

  • I'm old skool. I like a for loop. That's what happens when you come from other languages!

    So I'd be skipping all that lapply which doesn't explain what is going on and doing

    new_data = "" # a placeholder for the data you will insert in an new empty sheet
    
    for (file %in% all_files) {
    # open the file and get it's sheets
    sheets = excel_sheets(file)
    
    # check if the file has the sheet you want
    if ( "sheet1" %in% sheets) {
     # do nothing
    } else {
    # if not - create a sheet
    xlsx::write.xlsx(new_data,
     file, # You may need to add the path? 
     sheetName="sheet1",
     append=TRUE)
    } #if you are checking for all 3 sheets and adding any 3 missing, add another for loop?
    
    
    

    The R purists will say this is slow and inefficient. My question would be how fast does it need to be? How readable does it need to be?