Search code examples

How can I get href attr from this website?

I'm trying to parse the html of this web site, and when I get the html_nodes from the supposed links it get the response "" for all the nodes. What am I doing wrong?

texto_01 <- read_html(URL)
titulos_noticias <- texto_01 %>% html_nodes("p") %>% html_nodes("div") %>% html_nodes("ol") %>% html_nodes("li")  %>% html_nodes("a")
titulos_noticias_texto <- html_attr(titulos_noticias,"href")

Apreciate the help. Tks a lot, Felipe


  • The content is loaded dynamically. You can see the page conducting a search and then returning a result set. You need to mimic the actual search request you can find in the network tab. The results returned are in json format. The data of interest is within r$Rows and you construct the url by concatenating parts:

    paste0("", item$TipodoNormativoOWSCHCS,'&numero=',as.integer(item$NumeroOWSNMBR))

    You can use paste0 and map_df to handle this url reconstruction in a loop over the json object returned from r$Rows.

    You can see the javascript handling this process at line 6816 of the js file found in the sources tab.

    enter image description here

    Note that the js is using an already set variable found at line 5609

    enter image description here


    r = jsonlite::read_json(' AND contentSource:normativos AND cessão&rowlimit=15&startrow=0&sortlist=Data1OWSDATE:descending&refinementfilters=Data:range(datetime(2018-09-17),datetime(2019-09-20T23:59:59))')
    df <- map_df(r$Rows, function(item) {
      data.frame(title = item$title,
                 url = paste0("", item$TipodoNormativoOWSCHCS,'&numero=',as.integer(item$NumeroOWSNMBR)),