Search code examples
rrvest

How to get css information using rvest


I would like to retrieve the CSS for specific HTML elements using rvest. For example, suppose I go to Google's homepage. If I inspect the webpage, and navigate to the 7th input element, I can see that it contains the value = "Google Search" attribute. I can also see that using the code below. However, I can't see the CSS applied to the element (font family, color, etc.) with rvest, but I can see it when inspecting the page. How can I see the CSS using rvest?

library(rvest)
library(tidyverse)

url <- r'(https://www.google.com)'

# get html element for 'Google Search' text
search_text <-
  read_html(url) %>% 
    html_elements('input') %>%
    pluck(7)

# print it
search_text
#> {html_node}
#> <input class="lsb" value="Google Search" name="btnG" type="submit">

# show all attributes
search_text %>% 
  html_attrs()
#>           class           value            name            type 
#>           "lsb" "Google Search"          "btnG"        "submit"

Created on 2023-12-12 with reprex v2.0.2


Solution

  • The css for this class is stored in the body of the second <style> tag on the page. To extract it, we first get the name of the input's class attribute:

    library(rvest)
    library(tidyverse)
    
    url <- r'(https://www.google.com)'
    
    css_class <- read_html(url) %>% 
      html_elements('input') %>%
      pluck(7) %>%
      html_attr('class')
    
    css_class
    #> [1] "lsb"
    

    To get the css requires getting the contents of the second style tag and doing some text parsing, matching it to the desired css class:

    read_html(url) %>% 
      html_elements('style') %>%
      pluck(2) %>%
      html_text() %>%
      strsplit('}') %>%
      getElement(1) %>%
      {.[nchar(.) > 1]} %>%
      paste0('}') %>%
      {grep(paste0('\\.', css_class, '\\{'), ., value = TRUE)} %>%
      {gsub(';', ';\n', .)} %>%
      cat()
    #> .lsb{background:url(/images/nav_logo229.png) 0 -261px repeat-x;
    #> color:#000;
    #> border:none;
    #> cursor:pointer;
    #> height:30px;
    #> margin:0;
    #> outline:0;
    #> font:15px arial,sans-serif;
    #> vertical-align:top}
    

    Created on 2023-12-12 with reprex v2.0.2