Search code examples
r

combine paste0 with read_html and quotes


I'm trying to pull in data by using read_html. read_html takes a url. The url string is concatenated as 'x'. Since the concatenation uses '""quotation marks""', i have to use print and set quotes = FALSE to get rid of the back slashes (see screenshot below).

once I plug in x with the remaining read_html command, I get an error. Is there a better way to go about this?

updates:

> x<-paste0('"https://www.govtrack.us/congress/bills/', bills[15821,4],"/",bills[15821,1],'"')
> x
[1] "\"https://www.govtrack.us/congress/bills/118/HR8774\""
> print(x, quote = FALSE)
[1] "https://www.govtrack.us/congress/bills/118/HR8774"
> read_html(print(x, quote=FALSE)%>% html_nodes("#UserPositionModal+ p") %>% html_text())
[1] "https://www.govtrack.us/congress/bills/118/HR8774"
Error in UseMethod("xml_find_all") : 
  no applicable method for 'xml_find_all' applied to an object of class "character"
> read_html("https://www.govtrack.us/congress/bills/118/hr8774")%>% html_nodes("#UserPositionModal+ p") %>% html_text()
[1] "Making appropriations for the Department of Defense for the fiscal year ending September 30, 2025, and for other purposes."

Solution

  • You don't need an extra set of quotes when the variable itself is a string. (I think that's because it already evaluates with quotes?)

    x <- paste0("https://www.govtrack.us/congress/bills/", "118", "/", "hr8774")
    
    x
    #[1] "https://www.govtrack.us/congress/bills/118/hr8774"
    
    rvest::read_html(x) |>
      rvest::html_nodes("#UserPositionModal+ p") |>
      rvest::html_text()
    
    #[1] "Making appropriations for the Department of Defense for the fiscal 
    #year ending September 30, 2025, and for other purposes."