Rvest html_nodes span div other items

I'm scrapping through this html and I want to extract the text inside the <span data-testid="distance">

<span class="class1">
<span data-testid="distance">the text i want</span>
</span>
<span class="class2">
<span class="class1"><span>the other text i'm obtaining</span>
</span>

distancia <- hoteles_verdes %>% 
  html_elements("span.class1") %>%
  html_text()

The question would be how to isolate the data-testid="distance" on the html elements to later retrieve the html_text.

It's my first question posting. thanks!

Solution

You can use a CSS attribute selector.

For example, the [attribute|="value"] selector to select attribute "data-testid" with value = "distance" (note the single and double quotes):

library(rvest)

hoteles_verdes %>% 
  html_nodes('[data-testid|="distance"]') %>% 
  html_text()

Result:

[1] "the text i want"

Data:

hotel_verdes <- read_html('<span class="class1">
                           <span data-testid="distance">the text i want</span>
                           </span>
                           <span class="class2">
                           <span class="class1"><span>the other text im obtaining</span>
                           </span>')