I am trying to download one Word document from the below web page. When you press the button, the Word document will be downloaded automatically without showing any link for downloading.
Now I am trying with coping of XPath, to download this document inside the R.
library(rvest)
# send an HTTP GET request to the URL
url <- "https://ec.europa.eu/taxation_customs/tedb/taxDetails.html?id=4205/1672527600"
page <- read_html(url)
# locate the link to the Word document using CSS selector
doc_link <- page %>%
html_nodes(xpath='//*[@id="action_word_export"]')%>%
html_attr("href")
But unfortunately, this does not work, and nothing can be downloaded. So can anybody help how to solve this problem and download a Word document inside in R environment?
The problem is that the button triggers a javascript script that actually sends the download request, so there's not an href
attribute associated directly with the button. If you're open to using RSelenium
, here's a way to download the file:
# load libraries
library(RSelenium)
# define target url
url <- "https://ec.europa.eu/taxation_customs/tedb/taxDetails.html?id=4205/1672527600"
# start RSelenium ------------------------------------------------------------
rD <- rsDriver(browser="firefox", port=4550L, chromever = NULL)
remDr <- rD[["client"]]
# open the remote driver-------------------------------------------------------
remDr$open()
# Navigate to webpage -----------------------------------------------------
remDr$navigate(url)
# click on the download button ------------------------------------
remDr$findElement(using = "xpath",value = '//*[@id="action_word_export"]')$clickElement()
The file should download to your default downloads folder.
It's also possible that their download links are in a standard format. You can see what url address the javascript script points to using the web developer tools:
If you paste that bit to the main url you end up with a link that also downloads the file
download_link <- paste0("https://ec.europa.eu/taxation_customs/tedb/",
"exportTax.html?taxId=4205&taxVersionDate=1672527600")
https://ec.europa.eu/taxation_customs/tedb/exportTax.html?taxId=4205&taxVersionDate=1672527600
There might be a pattern that would allow you to paste together your search criteria to generate download links instead of using RSelenium