I am trying to get all of the url's in https://www.ato.gov.au/sitemap.xml (N.B it's a ~9mb file) using xml2. Any pointers appreciated.
library("xml2")
data1 <- read_xml("https://www.ato.gov.au/sitemap.xml")
xml_find_all(data, ".//loc")
I'm not getting the output I need:
{xml_nodeset (0)}
Not using xml2
but I was able to get it using rvest
library(dplyr)
library(rvest)
url <- "https://www.ato.gov.au/sitemap.xml"
url %>%
read_html() %>%
html_nodes("loc") %>%
html_text()