Search code examples
rxmlrcurl

R function wont modify global variable


I have a simple piece of R code which reads html data from a website then I am trying to loop through the pages and get data from each page. I have used this piece of code numerous times and it works. It adds to a R variable the results from each page but for some reason on this site it wont work. Any ideas?

library(XML)
library(RCurl)


data <- NULL

getData <- function(url) {
#For some reason cant read directly from site, need to use RCurl to get the data first
xData <- getURL(url)
table <- data.frame(readHTMLTable(xData)$'NULL')
data <- table
}

getData(url="https://steemdb.com/accounts/reputation?page=1")

Solution

  • I think I know what is wrong

    Change data <- table to data <<- table within your function

    You are assigning the result to the local environment for the function, whilst the <<- will be assigning it to the global environment.

    I would propose you try the following

    library(rvest)
    getData <- function(url) { html_table(read_html(url)) }
    
    data <- getData("https://steemdb.com/accounts/reputation?page=1")
    

    Or even better

    library(rvest)
    getData <- function(url) { html_table(read_html(url)) }
    steemdb.url <-"https://steemdb.com/accounts/reputation?page=" 
    
    data <- lapply(1:100, function(i) getData(paste0(steemdb.url, i)) )
    data <- do.call(rbind, data)
    View(data)
    
    1:100 will get you the first 100 pages.