I'm trying to scrape the unemployment rate tables for 2017-2021. but before I scrape the tables, I want to first figure out how to navigate to each page. This is what I have so far
library(RSelenium)
library(rvest)
library(tidyverse)
library(netstat)
# start server
remote_driver <- rsDriver(browser = 'chrome',
chromever = '99.0.4844.51',
verbose = F,
port = free_port())
#create client object
rd <- remote_driver$client
# open browser
rd$open()
# maximize window
rd$maxWindowSize()
# navigate to page
rd$navigate('https://www.bls.gov/lau/tables.htm')
years <- c(2017:2021)
for (i in years) {
rd$findElement(using = 'link text', years)$clickElement()
Sys.sleep(3)
rd$goBack
}
but it gives the error
Selenium message:java.util.ArrayList cannot be cast to java.lang.String
Error: Summary: UnknownError
Detail: An unknown server-side error occurred while processing the command.
Further Details: run errorDetails method
I was originally going to use rvest, but I couldn't figure out how to make a page sequence since the links all end with .htm. Not only that but the main link is /tables and the other links are /lastrk. It just seems easier to stick with selenium.
so, any suggestions?
Get the tables for Unemployment rates for metropolitan areas for year 2016 to 2020.
The links follow similar pattern, thus can be produced by us.
library(rvest)
library(dplyr)
df = lapply(c(16:20), function(x) {
link = paste0('https://www.bls.gov/lau/lamtrk', x, '.htm')
df1 =link %>% read_html() %>% html_nodes('.regular') %>%
html_table()
df = df1[[1]]
return(df)
}
)