I'm having some problems scraping data from a website. I do have not a lot of experience with web-scraping. My intended plan is to scrape some data using R from the following website: https://www.fatf-gafi.org/countries/
More precisely, I want to extract the list of Countries with some sort of sanctions
library(XML)
url <- paste0("https://www.fatf-gafi.org/countries/")
source <- readLines(url, encoding = "UTF-8")
parsed_doc <- htmlParse(source, encoding = "UTF-8")
But this doesn't bring up the intended information because is not under a table but it is a nested div.
Just to test how JavaScript evaluation works with V8, Embedded JavaScript and WebAssembly Engine.
https://cran.r-project.org/web/packages/V8/vignettes/v8_intro.html
Create context engine, evaluate requested JavaScript and get the value of countries
variable from V8 (it's turned into nested dataframe, thus the unnest()
), last row is filled with NA
s, thus the filter.
library(httr)
library(V8)
library(dplyr)
library(tidyr)
url <- paste0('https://www.fatf-gafi.org/media/fatf/fatfv20/',
'js/country-data-multi-lang.js')
js_content <- content(GET(url), 'text')
ct <- v8()
ct$eval(js_content)
ct$get("countries") %>%
unnest(cols = c(groups)) %>%
select(c(1:2,4:14,16)) %>%
filter(!is.na(name))
#> # A tibble: 209 × 14
#> name code FATF APG CFATF EAG ESAAMLG GABAC GAFILAT GIABA MENAFATF
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Afghanist… AF "" "mbr" "" "obs" "" "" "" "" ""
#> 2 Albania AL "" "" "" "" "" "" "" "" ""
#> 3 Algeria DZ "" "" "" "" "" "" "" "" "mbr"
#> 4 Andorra AD "" "" "" "" "" "" "" "" ""
#> 5 Angola AO "" "" "" "" "mbr" "" "" "" ""
#> 6 Anguilla AI "" "" "mbr" "" "" "" "" "" ""
#> 7 Antigua a… AG "" "" "mbr" "" "" "" "" "" ""
#> 8 Argentina AR "mbr" "non" "non" "non" "non" "" "mbr" "non" "non"
#> 9 Armenia AM "" "" "" "obs" "" "" "" "" ""
#> 10 Aruba Kin… AW "els" "" "mbr" "" "" "" "" "" ""
#> # … with 200 more rows, and 3 more variables: MONEYVAL <chr>,
#> # jurisdiction <chr>, id <chr>