Search code examples
rlistlapplyreadxl

Cannot read in datasets with readxl and lapply in R


I'm using readxl to import multiple files (12 monthly datasets for one year - e.g jan19, feb19) from a folder into my environment with the following code:

file.list <- list.files(pattern="xlsx$", recursive = TRUE)
df.list <- lapply(file.list, read_excel)

However, I'm getting the following error:

Error: Evaluation error: zip file '(my file path)' cannot be opened.

I've noticed that the file.list has 13 datasets instead of 12. The extra one is a dataset named ~jan19$.xlsx which readxl cannot read in since it doesn't exist. I was just wondering how to fix this?

Also - I have no problems opening these datasets using read_excel(dataset). None of the datasets are open in excel either.


Solution

  • Assuming you're using Windows, this indicates a temporary file:

    The tilde symbol is used to prefix hidden temporary files that are created when a document is opened in Windows. For example, when you open a Word document called “Document1.doc,” a file called “~$cument1.doc” is created in the same directory.

    These are created when you open an Excel file and should be cleared up when you close them, though this might not happen in some circumstances, such as if Excel crashes.

    Change your regex to exclude files starting with a tilde:

    file.list <- list.files(pattern="^[^\\~].+\\.xlsx$", recursive = TRUE)
    

    This pattern can be understood as:

    enter image description here

    This means files starting with a tilde will not be included in file.list so you should not have a problem trying to read them in.