I'm having trouble with xmlToList
, specifically several CDATA fields in an API response.
I'm working with an API that returns either XML or JSON. I'm using XML::xmlToList
to translate the XML-formatted API response into a list structure and RJSONIO's fromJSON
to do the same with the JSON format.
The fromJSON
output is exactly what I want but I want to be able to get the same structure from the XML response.
The main issue is that xmlToList
seems to discard the contents of fields if they're inside a CDATA
wrapper.
Here's an example URL for the API (in XML): http://www.colourlovers.com/api/color/6B4106
And here's one in JSON: http://www.colourlovers.com/api/color/6B4106?format=json
As you can see in the first link, there are several fields with values stored in CDATA
, like title
.
<title>
<![CDATA[ wet dirt ]]>
</title>
If I parse this with fromJSON
, I get the following:
List of 17
$ id : num 903893
$ title : chr "wet dirt"
$ userName : chr "jessicabrown"
$ numViews : num 323
$ numVotes : num 1
$ numComments: num 0
$ numHearts : num 0
$ rank : num 0
$ dateCreated: chr "2008-03-17 11:22:21"
$ hex : chr "6B4106"
$ rgb :List of 3
..$ red : num 107
..$ green: num 65
..$ blue : num 6
$ hsv :List of 3
..$ hue : num 35
..$ saturation: num 94
..$ value : num 42
$ description: chr ""
$ url : chr "http://www.colourlovers.com/color/6B4106/wet_dirt"
$ imageUrl : chr "http://www.colourlovers.com/img/6B4106/100/100/wet_dirt.png"
$ badgeUrl : chr "http://www.colourlovers.com/images/badges/c/903/903893_wet_dirt.png"
$ apiUrl : chr "http://www.colourlovers.com/api/color/6B4106"
The title
field is just a character string, as desired. But using xmlToList
, I get:
List of 17
$ id : chr "903893"
$ title :List of 1
..$ : NULL
$ userName :List of 1
..$ : NULL
$ numViews : chr "323"
$ numVotes : chr "1"
$ numComments: chr "0"
$ numHearts : chr "0"
$ rank : chr "0"
$ dateCreated: chr "2008-03-17 11:22:21"
$ hex : chr "6B4106"
$ rgb :List of 3
..$ red : chr "107"
..$ green: chr "65"
..$ blue : chr "6"
$ hsv :List of 3
..$ hue : chr "35"
..$ saturation: chr "94"
..$ value : chr "42"
$ description:List of 1
..$ : NULL
$ url :List of 1
..$ : NULL
$ imageUrl :List of 1
..$ : NULL
$ badgeUrl :List of 1
..$ : NULL
$ apiUrl : chr "http://www.colourlovers.com/api/color/6B4106"
Instead of returning either <![CDATA[ wet dirt ]]>
or wet dirt
, as I would expect, I just get a single-element list with NULL
contents. How can I get xmlToList
to handle the CDATA
elements?
Here's the code:
xmlurl <- url('http://www.colourlovers.com/api/color/6B4106')
response1 <- paste(readLines(xmlurl, warn=FALSE), collapse='')
close(xmlurl)
jsonurl <- url('http://www.colourlovers.com/api/color/6B4106?format=json')
response2 <- paste(readLines(jsonurl, warn=FALSE), collapse='')
close(jsonurl)
str(XML::xmlToList(response1))
str(RJSONIO::fromJSON(response2))
Have a look at XML:::parserOptions
Use
test <- xmlParse("http://www.colourlovers.com/api/color/6B4106", options = NOCDATA)
res <- xmlToList(test)
> res$color$title
[1] "wet dirt"
>