I would like to retrieve the CSS for specific HTML elements using rvest
. For example, suppose I go to Google's homepage. If I inspect the webpage, and navigate to the 7th input
element, I can see that it contains the value = "Google Search"
attribute. I can also see that using the code below. However, I can't see the CSS applied to the element (font family, color, etc.) with rvest
, but I can see it when inspecting the page. How can I see the CSS using rvest
?
library(rvest)
library(tidyverse)
url <- r'(https://www.google.com)'
# get html element for 'Google Search' text
search_text <-
read_html(url) %>%
html_elements('input') %>%
pluck(7)
# print it
search_text
#> {html_node}
#> <input class="lsb" value="Google Search" name="btnG" type="submit">
# show all attributes
search_text %>%
html_attrs()
#> class value name type
#> "lsb" "Google Search" "btnG" "submit"
Created on 2023-12-12 with reprex v2.0.2
The css for this class is stored in the body of the second <style>
tag on the page. To extract it, we first get the name of the input's class
attribute:
library(rvest)
library(tidyverse)
url <- r'(https://www.google.com)'
css_class <- read_html(url) %>%
html_elements('input') %>%
pluck(7) %>%
html_attr('class')
css_class
#> [1] "lsb"
To get the css requires getting the contents of the second style tag and doing some text parsing, matching it to the desired css class:
read_html(url) %>%
html_elements('style') %>%
pluck(2) %>%
html_text() %>%
strsplit('}') %>%
getElement(1) %>%
{.[nchar(.) > 1]} %>%
paste0('}') %>%
{grep(paste0('\\.', css_class, '\\{'), ., value = TRUE)} %>%
{gsub(';', ';\n', .)} %>%
cat()
#> .lsb{background:url(/images/nav_logo229.png) 0 -261px repeat-x;
#> color:#000;
#> border:none;
#> cursor:pointer;
#> height:30px;
#> margin:0;
#> outline:0;
#> font:15px arial,sans-serif;
#> vertical-align:top}
Created on 2023-12-12 with reprex v2.0.2