I am trying to scrape a table from an html file using Rvest in R. But html_node is not working, I think it is because the id in the xpath is in Spanish and has an accent.
Here is the code:
url <- "https://www3.ine.gub.uy/boletin/Boletin%20Ingresos%204to%20trimestre%202021.html"
html <- read_html(url)
data <- html_node(html, xpath='//*[@id="ingreso-medio-per-cápita"]/table/tbody')
I have been Googling a lot but I cannot find a solution.
I would really appreciate if someone could help me!
I'm not sure what the problem is here, but since the string up to the accented character is still unique, you can get it using the xpath function starts-with
url <- "https://www3.ine.gub.uy/boletin/Boletin%20Ingresos%204to%20trimestre%202021.html"
html <- read_html(url)
xpath <- '//div[starts-with(@id,"ingreso-medio-per-c")]/table'
data <- html_table(html_nodes(html, xpath = xpath))[[1]][1:3,]
#> Warning in table_fill(cells, trim = trim): NAs introduced by coercion
#> # A tibble: 3 x 3
#> `` `Trimestre 3 2021` `Trimestre 4 2021`
#> <chr> <dbl> <dbl>
#> 1 Total país 25.8 26.6
#> 2 Montevideo 32.5 33.5
#> 3 Interior 21.5 22.3
Created on 2022-02-15 by the reprex package (v2.0.1)