r web-scraping xpath rvest non-ascii-characters

Rvest not returning html_nodes whan id of xpath has an accent in R

I am trying to scrape a table from an html file using Rvest in R. But html_node is not working, I think it is because the id in the xpath is in Spanish and has an accent.

Here is the code:

library(rvest)
library(xml2)

url <- "https://www3.ine.gub.uy/boletin/Boletin%20Ingresos%204to%20trimestre%202021.html"
html <- read_html(url)
data <- html_node(html, xpath='//*[@id="ingreso-medio-per-cápita"]/table/tbody')

I have been Googling a lot but I cannot find a solution.
I would really appreciate if someone could help me!

Solution

I'm not sure what the problem is here, but since the string up to the accented character is still unique, you can get it using the xpath function starts-with

library(rvest)
library(xml2)

url <- "https://www3.ine.gub.uy/boletin/Boletin%20Ingresos%204to%20trimestre%202021.html"
html <- read_html(url)

xpath <- '//div[starts-with(@id,"ingreso-medio-per-c")]/table'
data <- html_table(html_nodes(html, xpath = xpath))[[1]][1:3,]
#> Warning in table_fill(cells, trim = trim): NAs introduced by coercion

data
#> # A tibble: 3 x 3
#>   ``         `Trimestre 3 2021` `Trimestre 4 2021`
#>   <chr>                   <dbl>              <dbl>
#> 1 Total país               25.8               26.6
#> 2 Montevideo               32.5               33.5
#> 3 Interior                 21.5               22.3

^{Created on 2022-02-15 by the reprex package (v2.0.1)}