I tried to cache read_html/xml2
to avoid flooding the server during development
library(digest)
library(xml2)
url = "https://en.wikipedia.org"
cache = digest(url)
if (file.exists(cache)) {
cat("Reading from cache\n")
html = readRDS(cache)
} else {
#Sys.sleep(3)
cat("Reading from web\n")
html = xml2::read_html(url)
saveRDS(html, file = cache)
}
html
This fails, because only externalpointers are stored in the file which are no longer valid on re-run. The same problem occurs when I use memoise
on read_html
.
You can always use as_list
and as_xml_document
to convert back and forth.
library(digest)
library(xml2)
url = "https://en.wikipedia.org"
cache = digest(url)
if (file.exists(cache)) {
cat("Reading from cache\n")
html = as_xml_document(readRDS(cache))
} else {
cat("Reading from web\n")
html = read_html(url)
saveRDS(as_list(html), file = cache)
}
html
Alternatively, look into read_xml
and write_xml
.