I’m trying to do a simple scrap in the table in the following url:
https://www.bcb.gov.br/controleinflacao/historicometas
By what i notice is that, When using rvest::read_html
or httr::GET
and when acessing the page source code i can't see anything related to the table, but when acessing google chrome developer tools, i can spot the table references in the elements tab.
Examble above is a simple code where i try to acess the content of the url and search of nodes that contain tables:
library( tidyverse )
library( rvest )
url <- “https://www.bcb.gov.br/controleinflacao/historicometas”
res <- url %>%
read_html( ) %>%
html_node( “table” )
this give me:
{xml_nodeset (0)}
opening the source code of the url mentioned we can see:
view-source:https://www.bcb.gov.br/controleinflacao/historicometas
Page Developer Tool table print
By what i have searched the question is that the scripts avaible in source code load the table. I have seen some solutions that use RSelenium, but i would like to know if there is some solution where i can scrap this table without using Rselenium.
Some other related StackOverflow questions:
Scraping webpage (with R) where all elements are placed inside an <app-root> tag
scraping table from a website result as empty
(Last one is a python example)
When dealing with dynamic sites, Network tab tends to be more useful than Inspector. And often you don't have to scroll through hundreds of requests or pages of minified javascript, you rather pick a search term from rendered page to identify the api endpoint that sent that piece of information. In this case searching for "Resolução CMN nº 2.615" pointed to the correct call, most of the site content (in pure html) was delivered as json.
library(tibble)
library(rvest)
historicometas <- jsonlite::read_json("https://www.bcb.gov.br/api/paginasite/sitebcb/controleinflacao/historicometas")
historicometas$conteudo %>%
read_html() %>%
html_element("table") %>%
html_table()
#> # A tibble: 27 × 7
#> Ano Norma Data Meta …¹ Taman…² Inter…³ Infla…⁴
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1999 Resolução CMN nº 2.615 30/6… 8 2 6-10 8,94
#> 2 2000 Resolução CMN nº 2.615 30/6… 6 2 4-8 5,97
#> 3 2001 Resolução CMN nº 2.615 30/6… 4 2 2-6 7,67
#> 4 2002 Resolução CMN nº 2.744 28/6… 3,5 2 1,5-5,5 12,53
#> 5 2003* Resolução CMN nº 2.842Resolução … 28/6… 3,254 22,5 1,25-5… 9,309,…
#> 6 2004* Resolução CMN nº 2.972Resolução … 27/6… 3,755,5 2,52,5 1,25-6… 7,60
#> 7 2005 Resolução CMN nº 3.108 25/6… 4,5 2,5 2-7 5,69
#> 8 2006 Resolução CMN nº 3.210 30/6… 4,5 2,0 2,5-6,5 3,14
#> 9 2007 Resolução CMN nº 3.291 23/6… 4,5 2,0 2,5-6,5 4,46
#> 10 2008 Resolução CMN nº 3.378 29/6… 4,5 2,0 2,5-6,5 5,90
#> # … with 17 more rows, and abbreviated variable names ¹`Meta (%)`,
#> # ²`Tamanhodo intervalo +/- (p.p.)`, ³`Intervalode tolerância (%)`,
#> # ⁴`Inflação efetiva(Variação do IPCA, %)`
Created on 2022-10-17 with reprex v2.0.2