I got the following error when using the read_html
function from the xml2 package:
Error in open.connection(x, "rb") : HTTP error 404.
Here is the URL I attempted to read:
xml2::read_html("https://www.act.is/media-centre/press-releases/actis-energy-platform-zuma-energía-reaches-financial-close-on-two-further-solar-farms-in-mexico/")
By contrast, no error was generated when reading this URL
xml2::read_html("https://www.act.is/media-centre/press-releases/actis-wins-cio-magazine-s-real-asset-award/")
The first URL contains a word with an accent mark "energía", the second URL does not. Is it possible to read URLs containing words with accent marks?
There're special characters in the URL and you have to escape them. In Python there's HTTP libraries for that, for the R you can find here
Python expamle:
base_url = "https://www.act.is/media-centre/press-releases/"
encoded_url = requests.utils.quote("actis-energy-platform-zuma-energía-reaches-financial-close-on-two-further-solar-farms-in-mexico/")
response = requests.get(base_url + encoded_url)
Encoded URL: