Search code examples
rweb-scrapingfinancestock

Web scraping of stock key stats from Finviz with R


I tried to scrape from Finviz for some stock key stats. I applied codes from the original question: Web scraping of key stats in Yahoo! Finance with R. To collect stats for as many stocks as possible I create a list of stock symbols and descriptions like this:

Symbol Description
A      Agilent Technologies
AAA    Alcoa Corp
AAC    Aac Holdings Inc
BABA   Alibaba Group Holding Ltd
CRM    Salesforce.Com Inc
...

I selected out the first column and stored it as a character in R and called it stocks. Then I applied the code:

for (s in stocks) {
url <- paste0("http://finviz.com/quote.ashx?t=", s)
webpage <- readLines(url)
html <- htmlTreeParse(webpage, useInternalNodes = TRUE, asText = TRUE)
tableNodes <- getNodeSet(html, "//table")

# ASSIGN TO STOCK NAMED DFS
assign(s, readHTMLTable(tableNodes[[9]], 
                      header= c("data1", "data2", "data3", "data4", "data5", "data6",
                                "data7", "data8", "data9", "data10", "data11", "data12")))

# ADD COLUMN TO IDENTIFY STOCK 
df <- get(s)
df['stock'] <- s
assign(s, df)
}

# COMBINE ALL STOCK DATA 
stockdatalist <- cbind(mget(stocks))
stockdata <- do.call(rbind, stockdatalist)
# MOVE STOCK ID TO FIRST COLUMN
stockdata <- stockdata[, c(ncol(stockdata), 1:ncol(stockdata)-1)]

However, for some of the stocks, Finviz doesn't have a page for them and I get error massages like this:

Error in file(con, "r") : cannot open the connection
In addition: Warning message:
In file(con, "r") :
cannot open URL 'http://finviz.com/quote.ashx?t=AGM.A': HTTP status was '404 
Not Found'

There are a good number of stocks that have this situation, so I can't delete them from my list manually. Is there a way to skip getting the page for those stocks?


Solution

  • Maybe something in these lines? Trying to filter stocks before using your forloop.

        library(tidyverse)
    
    #AGM.A should produce error
        stocks <- c("AXP","BA","CAT","AGM.A")
        urls <- paste0("http://finviz.com/quote.ashx?t=", stocks)
    
    #Test urls with possibly first and find out NAs
        temp_ind <- map(urls, possibly(readLines, otherwise = NA_real_))
        ind <- map_lgl(map(temp_ind, c(1)), is.na)
        ind <- which(ind == TRUE)
        filter.stocks <- stocks[-ind]
    
    #AGM.A is removed and you can just insert stocks which work to for loop.
            filter.stocks
        [1] "AXP" "BA"  "CAT"
    

    As statxiong pointed out url.exist here is simpler version:

    library(RCurl)
    library(tidyverse)
    
    stocks[map_lgl(urls, url.exists)]