Search code examples
rxml2

How to read specific tags using XML2


Problem

I am trying to get all of the url's in https://www.ato.gov.au/sitemap.xml (N.B it's a ~9mb file) using xml2. Any pointers appreciated.

My attempt

library("xml2")
data1 <- read_xml("https://www.ato.gov.au/sitemap.xml")
xml_find_all(data, ".//loc")

I'm not getting the output I need:

{xml_nodeset (0)}


Solution

  • Not using xml2 but I was able to get it using rvest

    library(dplyr)
    library(rvest)
    
    url <- "https://www.ato.gov.au/sitemap.xml"
    
    url %>%
      read_html() %>%
      html_nodes("loc") %>%
      html_text()