Search code examples
htmlrrvest

html_element() in rvest: matching element by font size


MWE:

html <- minimal_html('
    <p id="name1"><font size=5>Here is size 5 font </font></p>
    <p id="name2" class="second"><font size=3>And here is size 3 font </font></p>
   ')

html %>% html_elements('#name1')
html %>% html_elements('.second') 
html %>% html_elements('font')
html %>% html_elements('#5')
html %>% html_elements('.5')

My goal is to extract all elements with attribute "size=5". I know the easy way to do this when the attribute is "id" or "class" (as shown above) but I can't find any way to do it for the attribute "size". (I tried with both html_elements and html_nodes.) Is there a way to do this in the rvest package?


Solution

  • Not sure how to do this with the CSS selectors if that's required, but here's some XPath that does the trick:

    html %>% html_elements(xpath = '//font[@size=5]')
    

    Output:

    {xml_nodeset (1)}
    [1] <font size="5">Here is size 5 font </font>
    

    Or, for truly all elements with a size attribute of 5 (not just fonts):

    html %>% html_elements(xpath = '//*[@size=5]')