Search code examples
jsonrvalidationrcurlhttr

How to check a valid API call in R? RCurl and httr not helping at the moment


I want to import JSON data from UN COMTRADE. So, I wrote a list of valid countries and years and then I run a loop that works OK, except when a certain year contains no data for a country.

Given that, I want to test if my API call is valid, so I write this:

library(RCurl)

# this is an actual valid API call    
string = "http://comtrade.un.org/api/get?max=50000&type=C&freq=A&px=S2&ps=2010&r=4&p=all&rg=2&cc=AG4&fmt=json"

url.exists(string, useragent="curl/7.47.0 RCurl/1.95-4.8")

But, even for valid country codes and years that can be displayed on the internet browser as JSON text, the R output says

url.exists(string, useragent="curl/7.47.0 RCurl/1.95-4.8")
[1] FALSE

With httr I do

library(httr)
!http_error(string)

and I obtain [1] FALSE

How can I fix that false negative result?


Solution

  • I took a peak at url.exists(), and then wrote this simpler version

    > g = basicTextGatherer()
    > x = curlPerform(url=string, headerfunction=g$update, nobody=TRUE)
    > g$value()
    [1] "HTTP/1.1 302 Moved Temporarily\r\nLocation: https://comtrade.un.org/api/get?max=50000&type=C&freq=A&px=S2&ps=2010&r=4&p=all&rg=2&cc=AG4&fmt=json\r\nCache-Control: no-cache\r\nPragma: no-cache\r\nDate: Thu, 23 Feb 2017 23:09:13 GMT\r\nAge: 0\r\nConnection: close\r\nVia: 1.1 localhost.localdomain\r\n\r\n"
    

    The http: url is being redirected to https:, so I tried

    > string = sub("http", "https", string)
    > g = basicTextGatherer()
    > x = curlPerform(url=string, headerfunction=g$update, nobody=TRUE)
    > g$value()
    [1] "HTTP/1.1 405 Method Not Allowed\r\nCache-Control: no-cache\r\nPragma: no-cache\r\nAllow: GET\r\nContent-Length: 73\r\nContent-Type: application/json; charset=utf-8\r\nExpires: -1\r\nServer: Microsoft-IIS/7.5\r\nX-AspNet-Version: 4.0.30319\r\nX-Powered-By: ASP.NET\r\nDate: Thu, 23 Feb 2017 23:11:02 GMT\r\n\r\n"
    

    The 'HEAD' method, implied by the curl option nobody, is not supported. This is also why httr::http_error() fails -- because it is performing a HEAD request. It is a decision on the server side to not support HEAD requests, so nothing that can be done on the user side.

    You could also try to get just a single byte (e.g., RCurl::getURL(string, followlocation=TRUE, range="0-1")), but that also may not be supported (and is not for this query -- the entire query response is returned).

    So the only way I can test if the file actually exists is to retrieve it. I would use httr::GET(), maybe like

    tryCatch({
        response <- httr::GET(string)
        stop_for_status(response)
        ## ...
    }, http_error=function(e) {
        ## log error or otherwise recover
    })
    

    This is probably a more efficient solution anyway. If the query is successful, checking first and then performing the query requires two network calls, whereas performing the query without the check is only one network call. If the query fails, then under both approaches only a single network call is required, and the return value is similarly compact. So we save the latency induced by a network call under the most common scenario.