I am trying to scrape some financial reports from the SEC Edgar databases: https://www.sec.gov/oiea/Article/edgarguide.html
As it's dynamic, I'm using R Selenium on Firefox and now I'm a bit stuck. Firefox is open and I've navigated to the right page.
I'm after total assets.
remDr$navigate("https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0000001800&type=10-K&dateb=&owner=include&count=40&search_text=")
webElem1 <- remDr$findElement(using = 'css selector', value = "#interactiveDataBtn")
webElem1$sendKeysToElement(list("\uE007"))
webElem2 <- remDr$findElement(using = 'css selector', value = "#menu_cat3")
webElem2$sendKeysToElement(list("\uE007"))
webElem3 <- remDr$findElement(using = 'css selector', value = "#r6 .xbrlviewer")
webElem3$sendKeysToElement(list("\uE007"))
webElem4 <- remDr$findElement(using = 'css selector', value = "#idp6852922048 > tbody > tr:nth-child(12) > td:nth-child(2)") %>% html_text() %>% as.numeric()
As you can see I'm trying to use a mix of RSelenium and rvest to retrieve the value but all I'm getting is:
Error in UseMethod("xml_text") :
no applicable method for 'xml_text' applied to an object of class "c('webElement',
'remoteDriver', 'errorHandler', 'envRefClass', '.environment', 'refClass', 'environment', 'refObject')"
Any ideas?
If you check class(webElem4)
you will see it is (not surprisingly) an object of class webElement
. This is a special S4 class with its own methods defined. You are passing it to html_text
as if it were an html_node
, as defined in xml2
(or rvest
).
Although the two might seem quite similar, a webElement
represents a pointer to an actively rendered node in a browser, not just an inert string of text to be parsed. The rvest
and xml2
packages have no concept of what a webElement
is or how to read it.
Fortunately, they don't need to. The webElement
class has its own method for extracting text from the associated element. So in your case (using a full worked example with Chrome rather than Firefox):
library(RSelenium)
remDr <- rsDriver(port = 4567L, chromever = "84.0.4147.30")$client
remDr$navigate("https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0000001800&type=10-K&dateb=&owner=include&count=40&search_text=")
webElem1 <- remDr$findElement(using = 'css selector', value = "#interactiveDataBtn")
webElem1$sendKeysToElement(list("\uE007"))
webElem2 <- remDr$findElement(using = 'css selector', value = "#menu_cat3")
webElem2$sendKeysToElement(list("\uE007"))
webElem3 <- remDr$findElement(using = 'css selector', value = "#r6 .xbrlviewer")
webElem3$sendKeysToElement(list("\uE007"))
webElem4 <- remDr$findElement(using = 'css selector', value = "#idp6852922048 > tbody > tr:nth-child(12) > td:nth-child(2)")
result <- webElem4$getElementText()[[1]]
result
#> [1] "15,667"