MWE:
html <- minimal_html('
<p id="name1"><font size=5>Here is size 5 font </font></p>
<p id="name2" class="second"><font size=3>And here is size 3 font </font></p>
')
html %>% html_elements('#name1')
html %>% html_elements('.second')
html %>% html_elements('font')
html %>% html_elements('#5')
html %>% html_elements('.5')
My goal is to extract all elements with attribute "size=5". I know the easy way to do this when the attribute is "id" or "class" (as shown above) but I can't find any way to do it for the attribute "size". (I tried with both html_elements and html_nodes.) Is there a way to do this in the rvest package?
Not sure how to do this with the CSS selectors if that's required, but here's some XPath that does the trick:
html %>% html_elements(xpath = '//font[@size=5]')
Output:
{xml_nodeset (1)}
[1] <font size="5">Here is size 5 font </font>
Or, for truly all elements with a size attribute of 5 (not just fonts):
html %>% html_elements(xpath = '//*[@size=5]')