Search code examples
rxlconnect

Reading Excel files with XLConnect returns "Error: InvalidFormatException (Java): Your InputStream was neither an OLE2 stream, nor an OOXML stream"


I am trying to read the contents of a score of Excel files into R with XLConnect. This is a simplified version of my code:

# point to a folder
path <- "/path/to/folder"

# get all the Excel files in that folder
files <- list.files(path, pattern = "*.xlsx")

# create an empty data frame
dat <- data.frame(var.1 = character(), var.2 = numeric())

# load XLConnect
library("XLConnect")

# loop over the files
for (i in seq_along(files)) {
    # read each Excel file
    wb <- loadWorkbook(paste(pfad, files[i], sep = "/"))
    # fill the data frame with data from the Excel file
    dat[i, 1:2] <- readWorksheet(wb, "Table1", startRow = 1, startCol = 1, endRow = 2, endCol = 1, header = FALSE)
    rm(wb)
}

I can read in a single file when I specify it with loadWorkbook(paste(pfad, files[1], sep = "/")), but when I loop over the file list with files[i], the code inside the for-loop returns the following error:

Error: InvalidFormatException (Java):
    Your InputStream was neither an OLE2 stream, nor an OOXML stream

What am I doing wrong?


Solution

  • The problem had nothing to do with my code.

    I had some of the files in that folder open in Excel. When you open a file in Excel, Excel creates an invisible file named "~$filename.xlsx". Since my regular expression searched for files with the suffix ".xlsx", these files were found, too, and since these files are not spreadsheet files, XLConnect couldn't read them and threw an error.

    I solved the problem by closing those files in Excel.

    Another solution would be to exclude files that begin with a tilde in the regular expression, with something like:

    list.files(path, pattern = "^[^~].+\\.xlsx")