Search code examples
rurldownloadxlssdmx

How to read IMF xls- or sdmx-data from url?


From the IMF I want to read a .xls file from an URL directly into R, but all attempts fail so far. Weirdly, I can download the file manually or by download.file() and open it without problems in Microsoft Outlook or in a text editor. However, even then I can't read the data into R.

I always try with both https and http.

myUrl <- "https://www.imf.org/external/pubs/ft/weo/2019/02/weodata/WEOOct2019all.xls"
myUrl2 <- "http://www.imf.org/external/pubs/ft/weo/2019/02/weodata/WEOOct2019all.xls"

1. Classic approach – fails.

imf <- read.table(file=myUrl, sep="\t", header=TRUE)
# Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
#                 line 51 did not have 55 elements

imf <- read.table(file=url(myUrl), sep="\t", header=TRUE)
# Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
#                 line 51 did not have 55 elements

2. Several packages – fails.

imf <- readxl::read_xls(myUrl)
# Error: `path` does not exist: ‘https://www.imf.org/external/pubs/ft/weo/2019/02/weodata/WEOOct2019all.xls’

imf <- readxl::read_xls(myUrl2)
# Error: `path` does not exist: ‘http://www.imf.org/external/pubs/ft/weo/2019/02/weodata/WEOOct2019all.xls’

imf <- gdata::read.xls(myUrl)
# Error in xls2sep(xls, sheet, verbose = verbose, ..., method = method,  : 
#   Intermediate file 'C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f873be18e0.csv' missing!
# In addition: Warning message:
# In system(cmd, intern = !verbose) :
#   running command '"C:\STRAWB~1\perl\bin\perl.exe" 
#   "C:/Program Files/R/R-3.6.1rc/library/gdata/perl/xls2csv.pl" 
#   "https://www.imf.org/external/pubs/ft/weo/2019/02/weodata/WEOOct2019all.xls" 
#   "C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f873be18e0.csv" "1"' had status 2
# Error in file.exists(tfn) : invalid 'file' argument

imf <- gdata::read.xls(myUrl2)  # <----------------------------------------------  THIS DOWNLOADS SOMETHING AT LEAST!
# trying URL 'http://www.imf.org/external/pubs/ft/weo/2019/02/weodata/WEOOct2019all.xls'
# Content type 'application/vnd.ms-excel' length unknown
# downloaded 8.9 MB
#
# Error in xls2sep(xls, sheet, verbose = verbose, ..., method = method,  : 
#   Intermediate file 'C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f87ded406b.csv' missing!
# In addition: Warning message:
# In system(cmd, intern = !verbose) :
#   running command '"C:\STRAWB~1\perl\bin\perl.exe" 
#   "C:/Program Files/R/R-3.6.1rc/library/gdata/perl/xls2csv.pl" 
#   "C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f87f532cb3.xls" 
#   "C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f87ded406b.csv" "1"' had status 255
# Error in file.exists(tfn) : invalid 'file' argument

3. Tempfile approach – fails.

temp <- tempfile()
download.file(myUrl, temp)  # THIS WORKS...

## BUT...
imf <- gdata::read.xls(temp)
# Error in xls2sep(xls, sheet, verbose = verbose, ..., method = method,  : 
#   Intermediate file 'C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f870f55e04.csv' missing!
# In addition: Warning message:
# In system(cmd, intern = !verbose) :
#   running command '"C:\STRAWB~1\perl\bin\perl.exe"
#   "C:/Program Files/R/R-3.6.1rc/library/gdata/perl/xls2csv.pl" 
#   "C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f8746a46db" 
#   "C:\Users\jay\AppData\Local\Temp\RtmpUtW45x\file16f870f55e04.csv" "1"' had status 255
# Error in file.exists(tfn) : invalid 'file' argument

# even not...
tmp1 <- readLines(temp)
# Warning message:
#   In readLines(temp) :
#   incomplete final line found on 
#   'C:\Users\jay\AppData\Local\Temp\Rtmp00GPlq\file2334435c2905'
str(tmp1)
#  chr [1:8733] "WEO Country Code\tISO\tWEO Subject Code\tCountry\tSubject 
#    Descriptor\tSubject Notes\tUnits\tScale\tCountry/Seri"| __truncated__ ...

4. SDMX

I also tried the SDMX the IMF offer, but also without success. Probably this would be a more sophisticated approach, but I never used SDMX.

link <- "https://www.imf.org/external/pubs/ft/weo/2019/02/weodata/WEOOct2019_SDMXData.zip"

temp <- tempfile()
download.file(link, temp, quiet=TRUE)
imf <- rsdmx::readSDMX(temp)
# Error in function (type, msg, asError = TRUE)  : 
#   Could not resolve host: C

# imf <- rsdmx::readSDMX(unzip(temp))  # runs forever and crashes R
unlink(temp)

Now... does anybody know what's going on, and how I may load the data into R?


Solution

  • Why not just use fill=TRUE?

    imf <- read.table(file=myUrl, sep="\t", header=TRUE, fill = TRUE)
    

    from ?read.table

    fill

    logical. If TRUE then in case the rows have unequal length, blank fields are implicitly added. See ‘Details’.