Search code examples
rweb-scrapingdownloadcorrupt

Downloading files in r


I'm trying to download a spreadsheet from the Australian Bureau of Statistics using download.file. But I'm getting a corrupted file back and when I go to open it using readxl my session is crashing.

target = "http://www.abs.gov.au/ausstats/meisubs.NSF/log?openagent&5206001_key_aggregates.xls&5206.0&Time%20Series%20Spreadsheet&24FF946FB10A10CDCA258192001DAC4B&0&Jun%202017&06.09.2017&Latest"
dest = 'downloaded_file.xlsx'

download.file(url = target, destfile = dest)

Any pointers would be great.


Solution

  • Looks like that file is an xls file not using the newer xlsx format. Remove the 'x' at the end of the filename so readxl knows to use the right format. Note also that I'm pretty sure xls is a binary format, so you should use binary mode to write the file.

    target = "http://www.abs.gov.au/ausstats/meisubs.NSF/log?openagent&5206001_key_aggregates.xls&5206.0&Time%20Series%20Spreadsheet&24FF946FB10A10CDCA258192001DAC4B&0&Jun%202017&06.09.2017&Latest"
    dest = 'downloaded_file.xls'
    
    download.file(url = target, destfile = dest, mode='wb')