I am trying to use R to get data from an open data source in the Netherlands. The source is here.
When you open this in a browser (at least Chrome), it is presented as xml code. So I thought I can use the RCurl package to parse it, and then use XPath to extract the specific nodes I seek.
However, when trying to parse it, I run into problems. It does not seem to be straight xml, but has json in it.
How can I easily extract the information from the datasource? Not looking for the full solution, just guidance in the right direction.
If I try:
url <- "http://www.kiesbeter.nl/open-data/api/care/careproviders/?apikey=18a2b2b0-d232-4f48-8d10-5fc10ff04b17"
html <- getURL(url)
doc <- htmlParse(html,asText = TRUE)
It seems then that doc is in some JSON format still. I cannot seem to use the getNodeSet(doc, "//careproviders")
.
However, if I use fromJSON first, I get it in an awkward list format.
So question is how can I treat this data so that I easily can get the information out of this dataset (e.g. all care providers). And how do I recognize what format the data is in?
Use
html <- getURL(url, httpheader = c(Accept = "text/xml"))
with specified content-type to get XML with curl.
A little clarification. The service provides both XML and JSON data formats, with the default of JSON. Your browser sends text/xml
(among others) in Accept
header with request, thus service returns XML. The curl (by default) doesn't send anything so, service returns JSON format, which is a default type.