I am attempting to write a loop through a file of Excel workbooks, checking the names of all sheets to ensure that each excel workbook has three sheet names in it ('sheet1', 'sheet2', 'sheet3' - in any order). If one of those sheet names is missing, I want to create the missing sheet (blank) in the workbook missing the sheet.
I've never worked across multiple files like this and I'm open to any and all suggestions / approaches!
library(tidyverse)
library(readxl)
'%notin%' <- Negate('%in%')
all_files <- list.files(path="./path/to/files'", pattern = "*.xlsx", full.names = T)
#write files to a list to see the sheet names
result <- lapply(all_files, function(x) {
all_sheets <- excel_sheets(x)
}
#filter by which workbooks match criteria
correct_sheet <- lapply(result, FUN = function(x) {
correct_sheet <- intersect(result, c('sheet1', 'sheet2', 'sheet3') )
})
#Add worksheet in a loop
for (result %notin% correct_sheet){
}
I'm old skool. I like a for loop. That's what happens when you come from other languages!
So I'd be skipping all that lapply which doesn't explain what is going on and doing
new_data = "" # a placeholder for the data you will insert in an new empty sheet
for (file %in% all_files) {
# open the file and get it's sheets
sheets = excel_sheets(file)
# check if the file has the sheet you want
if ( "sheet1" %in% sheets) {
# do nothing
} else {
# if not - create a sheet
xlsx::write.xlsx(new_data,
file, # You may need to add the path?
sheetName="sheet1",
append=TRUE)
} #if you are checking for all 3 sheets and adding any 3 missing, add another for loop?
The R purists will say this is slow and inefficient. My question would be how fast does it need to be? How readable does it need to be?