Scrape Anchored Website with Selenium Package in R

I am fairly new to R and am having trouble with pulling data from the Forbes website.

My current function is:

url =

http://www.forbes.com/global2000/list/#page:1_sort:0_direction:asc_search:_filter:All%20industries_filter:All%20countries_filter:All%20states

data = readHTMLTable(url)

However, the Forbes website is anchored with the "#" symbol within the link. I downloaded the rselenium package in order to parse the data I want, but I am not well versed with reselenium.

Does anyone have any advice/expertise with reselenium and how I can pull the data from Forbes using reselenium? Ideally I want to pull data from page 1, 2, etc. from the website.

Thanks!

Solution

It's a little hacky, but here's my solution using rvest and read.delim...

library(rvest)

url <- "http://www.forbes.com/global2000/list/#page:1_sort:0_direction:asc_search:_filter:All%20industries_filter:All%20countries_filter:All%20states"
a <- html(url) %>%
  html_nodes("#thelist") %>%
  html_text()
con <- textConnection(a)
df <- read.delim(con, sep="\t", header=F, skip=12, stringsAsFactors=F)
close(con)
df$V1[df$V1==""] <- df$V3[df$V1==""]
df$V2 <- df$V3 <- NULL
df <- subset(df, V1!="")
df$index <- 1:nrow(df)
df2 <- data.frame(company=df$V1[df$index%%6==1],
                  country=df$V1[df$index%%6==2],
                  sales=df$V1[df$index%%6==3],
                  profits=df$V1[df$index%%6==4],
                  assets=df$V1[df$index%%6==5],
                  market_value=df$V1[df$index%%6==0])