Search code examples
xmlrweb-scrapingrvesthttr

rvest does not extract self-closing xml-nodes


Trying to parse this xml-file: http://data.fcc.gov/api/block/find?latitude=48.9905&longitude=-122.2733&showall=false

rvest/xml2 seams not to recognize the nodes correctly:

require(rvest) #which uses xml2 internally
doc <- read_xml("http://data.fcc.gov/api/block/find?latitude=48.9905&longitude=-122.2733&showall=false")
> doc
{xml_document}
<Response>
[1] <Block FIPS="530730102002091"/>
[2] <County FIPS="53073" name="Whatcom"/>
[3] <State FIPS="53" code="WA" name="Washington"/>

Trying to get the County node i did - what results in an Error (no matches)

doc %>% xml_node("County") # Error: No matches

I also tried it via read_html and httr::GET combined with both: read_html and read_xml... Any Idea`

P.S.: The example is taken from here: Parsing an XML response to a query. I tried to solve this one via rvest


Solution

  • That doc has a namespace, xmlns, you can examine it with xml_ns, and use the prefix in your xpath,

    xml_find_one(doc, "//d1:County", xml_ns(doc))