Search code examples
rexcelgdatarodbc

Impossibility of loading a "Excel 5.0/7.0 (BIFF5) format" file in R


There's a multitude of ways to load Excel files in R, as reported here: http://www.milanor.net/blog/?p=779

However, I've tried most of the options out there (RODBC, xlsx, gdata, XLConnect) and just cannot make R load this particular file, published by the British government in 2013:

http://www.ons.gov.uk/ons/rel/npp/national-population-projections/2012-based-projections/rft-table-a3-4-principal-projection---england-population-single-year-of-age.xls

Here's an example of my failed attempts:

# save the file
download.file("http://www.ons.gov.uk/ons/rel/npp/national-population-projections/2012-based-projections/rft-table-a3-4-principal-projection---england-population-single-year-of-age.xls", destfile = "input-data/future-pop-ons.xls")

library(RODBC)
XLConnect::readWorksheetFromFile(file = "input-data/future-pop-ons.xls", sheet = 3)
## Error: OldExcelFormatException (Java): The supplied spreadsheet seems to be Excel 5.0/7.0 (BIFF5) format. POI only supports BIFF8 format (from Excel versions 97/2000/XP/2003)

library(XLConnect)
XLConnect::readWorksheet("input-data/future-pop-ons.xls", sheet = 3)
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘readWorksheet’ for signature ‘"character", "numeric"’

library(gdata)
read.xls(xls = "input-data/future-pop-ons.xls", sheet = 3)
Use of uninitialized value $format in substitution (s///) at /home/robin/R/i686-pc-linux-gnu-library/3.1/gdata/perl/Spreadsheet/ParseExcel/Utility.pm line 183.

Would be great to be able to load this type of file directly, ensuring reproducibility.


Solution

  • This works for me on a rocker/rstudio container, which I'm using for a high level of isolation and reproducibility:

    download.file("http://www.ons.gov.uk/ons/rel/npp/national-population-projections/2012-based-projections/rft-table-a3-4-principal-projection---england-population-single-year-of-age.xls", 
                  destfile = "future-pop-ons.xls", 
                  method = "wget")
    
    library(gdata)
    xx <- read.xls(xls = "future-pop-ons.xls", sheet = 3, fileEncoding="latin1")
    

    There's a lot of warning-like output in the console, but the full sheet reads in, and that's the main thing.