I found an old thread (How do you read a password protected excel file into r?) that recommended that I use the following code to read in a password protected file:
install.packages("excel.link")
library("excel.link")
dat <- xl.read.file("TestWorkbook.xlsx", password = "pass", write.res.password="pass")
dat
However, when I try to do this my R immediately crashes. I've tried removing the write.res.password argument, and that doesn't seem to be the issue. I have a hunch that excel.link might not work with the newest version of R, so if you know of any other ways to do this I'd appreciate the advice.
EDIT: Using read.xlsx generates this error:
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "newInstance", .jfindClass(class), :
org.apache.poi.poifs.filesystem.OfficeXmlFileException:
The supplied data appears to be in the Office 2007+ XML.
You are calling the part of POI that deals with OLE2 Office Documents.
You need to call a different part of POI to process this data (eg XSSF instead of HSSF)
You can remove the password of the excel file without knowing it with the following function (adapted version of code available at https://www.r-bloggers.com/2018/05/remove-password-protection-from-excel-sheets-using-r/)
remove_Password_Protection_From_Excel_File <- function(dir, file, bool_XLSXM = FALSE)
{
initial_Dir <- getwd()
setwd(dir)
# file name and path after removing protection
if(bool_XLSXM == TRUE)
{
file_unlocked <- stringr::str_replace(basename(file), ".xlsm$", "_unlocked.xlsm")
}else
{
file_unlocked <- stringr::str_replace(basename(file), ".xlsx$", "_unlocked.xlsx")
}
file_unlocked_path <- file.path(dir, file_unlocked)
# create temporary directory in project folder
# so we see what is going on
temp_dir <- "_tmp"
# remove and recreate _tmp folder in case it already exists
unlink(temp_dir, recursive = TRUE)
dir.create(temp_dir)
# unzip Excel file into temp folder
unzip(file, exdir = temp_dir)
# get full path to XML files for all worksheets
worksheet_paths <- list.files(paste0(temp_dir, "/xl/worksheets"), full.name = TRUE, pattern = ".xml")
# remove the XML node which contains the sheet protection
# We might of course use e.g. xml2 to parse the XML file, but this simple approach will suffice here
for(ws in worksheet_paths)
{
file_Content <- readLines(ws, encoding = "windows1")
# the "sheetProtection" node contains the hashed password "<sheetProtection SOME INFO />"
# we simply remove the whole node
out <- str_replace(file_Content, "<sheetProtection.*?/>", "")
writeLines(out, ws)
}
worksheet_Protection_Paths <- paste0(temp_dir, "/xl/workbook.xml")
file_Content <- readLines(worksheet_Protection_Paths , encoding = "windows1")
out <- stringr::str_replace(file_Content, "<workbookProtection.*?/>", "")
writeLines(out, worksheet_Protection_Paths)
# create a new zip, i.e. Excel file, containing the modified XML files
old_wd <- setwd(temp_dir)
files <- list.files(recursive = T, full.names = F, all.files = T, no.. = T)
# as the Excel file is a zip file, we can directly replace the .zip extension by .xlsx
zip::zip(file_unlocked_path, files = files) # utils::zip does not work for some reason
setwd(old_wd)
# clean up and remove temporary directory
unlink(temp_dir, recursive = T)
setwd(initial_Dir)
}
Once the password is removed, you can read the Excel file. This approach works for me.