web scraping RSelenium findElement

I feel this is supposed to be simple but I have been struggled to get it right. I'm trying to extract the Employees number ("2,300,000") from this webpage: https://fortune.com/company/walmart/

I used Chrome's extension SelectorGadget to locate the number---"info__row--7f9lE:nth-child(13) .info__value--2AHH7""

```
library(RSelenium)
library(rvest)
library(netstat)

rs_driver_object<-rsDriver(browser='chrome',chromever='103.0.5060.53',verbose=FALSE, port=free_port())
remDr<-rs_driver_object$client
remDr$navigate('https://fortune.com/company/walmart/')
Employees<-remDr$findElement(using = 'xpath','//h3[@class="info__row--7f9lE:nth-child(13) .info__value--2AHH7"]')
Employees
```

An error says 

> "Selenium message:no such element: Unable to locate element".

I have also tried:
```
Employees<-remDr$findElement(using = 'class name','info__value--2AHH7')
```
But it returns the data not as wanted. 


Can someone point out the problem? Really appreciate it!

Updated I modified the code as suggested by Frodo below in the comment to apply to multiple webpages to save the statistics as a dataframe. But I still encountered an error.

    library(RSelenium)
    library(rvest)
    library(netstat)
    
rs_driver_object<-rsDriver(browser='chrome',chromever='103.0.5060.53',verbose=FALSE, port=netstat::free_port())
remDr<-rs_driver_object$client


Data<-data.frame("url" = c("https://fortune.com/company/walmart/", "https://fortune.com/company/amazon-com/"              
                           ,"https://fortune.com/company/apple/"                   
                           ,"https://fortune.com/company/cvs-health/" 
                           ,"https://fortune.com/company/jpmorgan-chase/"          
                           ,"https://fortune.com/company/verizon/"                 
                           ,"https://fortune.com/company/ford-motor/"              
                           , "https://fortune.com/company/general-motors/"          
                           ,"https://fortune.com/company/anthem/"                  
                           , "https://fortune.com/company/centene/"                 
                           ,"https://fortune.com/company/fannie-mae/"              
                           , "https://fortune.com/company/comcast/"                 
                           , "https://fortune.com/company/chevron/"                 
                           ,"https://fortune.com/company/dell-technologies/"       
                           ,"https://fortune.com/company/bank-of-america-corp/"    
                           ,"https://fortune.com/company/target/") )

Data$numEmp<-"NA"
Data$numEmp <- numeric()



for (i in 1:length(Data$url))
  {
  
remDr$navigate(url = Data$url[i])
pgSrc <- remDr$getPageSource()
pgCnt <- read_html(pgSrc[[1]])
Data$numEmp[i] <- pgCnt %>%
  html_nodes(xpath = "//div[text()='Employees']/following-sibling::div") %>%
  html_text(trim = TRUE)

}
Data$numEmp

Selenium message:unknown error: unexpected command response (Session info: chrome=103.0.5060.114) Build info: version: '4.0.0-alpha-2', revision: 'f148142cf8', time: '2019-07-01T21:30:10' System info: host: 'DESKTOP-VCCIL8P', ip: '192.168.1.249', os.name: 'Windows 10', os.arch: 'amd64', os.version: '10.0', java.version: '1.8.0_311' Driver info: driver.version: unknown

Error: Summary: UnknownError Detail: An unknown server-side error occurred while processing the command. class: org.openqa.selenium.WebDriverException Further Details: run errorDetails method

Can someone please take another look?

Solution

Use RSelenium to load up the webpage and get the page source

remdr$navigate(url = 'https://fortune.com/company/walmart/')
pgSrc <- remdr$getPageSource()

Use Rvest to read the contents of the webpage

pgCnt <- read_html(pgSrc[[1]])

Further, use rvest::html_nodes and rvest::html_text functions to extract the text using relevant xpath selectors. (this Chrome extension should help)

reqTxt <- pgCnt %>%
  html_nodes(xpath = "//div[text()='Employees']/following-sibling::div") %>%
  html_text(trim = TRUE)

Output of reqTxt

> reqTxt
[1] "2,300,000"

UPDATE

The error Selenium message:unknown error: unexpected command response seems to be occurring specifically 103 version of Chromedriver. More info here. One of the answers there was a giving a simple wait of 5 seconds before and after the driver navigates to the URL. And I have also used tryCatch to keep continuing the code to run within a while loop. Essentially, the code will run until it loads the page. This seems to work.

# Function to fetch employee count
getEmployees <- function(myURL) {
  pagestatus <<- 0
  while(pagestatus == 0) {
    tryCatch(
      expr = remDr$navigate(url = myURL),
      pagestatus <<- 1,
      error = function(error){
        pagestatus <<- 0
        
      }  
    )
  }
  pgSrc <- remDr$getPageSource()
  pgCnt <- read_html(pgSrc[[1]])
  return(pgCnt %>% html_nodes(xpath = "//div[text()='Employees']/following-sibling::div") %>% html_text(trim = TRUE))
}

Implement this function to all of your dataframe URLs.

for(i in 1:nrow(Data)) {
  Sys.sleep(5)
  Data[i, 2] <- getEmployees(Data[i, 1])
  Sys.sleep(5)
}

Now when we see the output of second column

> Data[, 2]
 [1] "2,300,000" "1,608,000" "154,000"   "258,000"   "271,025"   "118,400"  
 [7] "183,000"   "157,000"   "98,200"    "72,500"    "7,400"     "189,000"  
[13] "42,595"    "133,000"   "208,248"   "450,000"