I'm using Rselenium
for scraping. For this, I have installed java
and JDK's
, chromedriver
, selenium server standalone
and the headless browser phantomjs
in my VM instance of Google Cloud.
I need to catch the text of the first rating:
remDr <- remoteDriver(browserName = 'chrome', port = 4444L)
remDr$open()
remDr$setWindowSize(1280L, 1024L)
remDr$navigate("https://www.ratebeer.com/reviews/sullerica-1561/294423")
text_post = remDr$findElements("xpath",'//*[@id="root"]/div/div[2]/div/div[2]/div[2]/div/div[1]/div[3]/div/div[2]/div[1]/div/div[2]/div/div[1]/div/div/div[1]')
text_post
## list()
Finally text_post
is empty.
However, If I test the same script on my local laptop with RSelenium, chrome browser and the same XPath, it's a success!
What's going on?
Is it due to using Phantomjs?
sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS
As per the HTML you can use the xpath as:
//div[@id="root"]//span[contains(.,'20')]//following::div[contains(@class,'LinesEllipsis')]
Note: As the elements are dynamically generated elements you have to induce WebDriverWait for the Elements to be visible.