Search code examples
rphantomjsplaintextrselenium

rselenium | get the text of the webpage


Is there a way to get the plain Text from the remoteDriver in RSelenium? Somethink like: remDr$getPlainText() as equivalent to remDr$getPageSource().

Workarount:

I managed to save phantomJS's plainText to a file as follows:

  require(RSelenium)
  pJS <- phantom()
  Sys.sleep(5) # give the binary a moment
  remDr = remoteDriver(browserName = 'phantomjs')
  remDr$open()
  remDr$phantomExecute('var page = this;
                         var fs = require(\"fs\");
                       page.onLoadFinished = function(status) {
                       var txtFile = fs.open(\"url.txt\", \"w\");
                       txtFile.write(page.plainText);
                       txtFile.close();
                       };')

  remDr$navigate(some_url)

But then i have to read the file in afterwords...

My workaround is done similar to https://cran.r-project.org/web/packages/RSelenium/vignettes/RSelenium-headless.html#id3b


Solution

  • I am not sure whether if it solves the problem.

    library(RSelenium)
    checkForServer()
    startServer()
    re<-remoteDriver()
    re$open()
    re$navigate("link")
    txt<-re$findElement(using='css selector',"body")$getElementText()