Search code examples
rtmweb-mining

Programmatically look up a ticker symbol in R


I have a field of data containing company names, such as

company <- c("Microsoft", "Apple", "Cloudera", "Ford")
> company

  Company
1 Microsoft
2 Apple
3 Cloudera
4 Ford

and so on.

The package tm.plugin.webmining allows you to query data from Yahoo! Finance based on ticker symbols:

require(tm.plugin.webmining)
results <- WebCorpus(YahooFinanceSource("MSFT")) 

I'm missing the in-between step. How can I query ticket symbols programmatically based on company names?


Solution

  • I couldn't manage to do this with the tm.plugin.webmining package, but I came up with a rough solution - pulling & parsing data from this web file: ftp://ftp.nasdaqtrader.com/SymbolDirectory/nasdaqlisted.txt. I say rough because for some reason my calls with httr::content(httr::GET(...)) don't work every time - I think it has to do with the type of web address (ftp://) but I don't do that much web scraping so I can't really explain this. It seemed to work better on my Linux than my Mac, but that could be irrelevant. Regardless, here's what I got: Thanks to @thelatemail's comment, this seems to be working much smoother:

    library(quantmod) ## optional
    symbolData <- read.csv(
      "ftp://ftp.nasdaqtrader.com/SymbolDirectory/nasdaqlisted.txt",
      sep="|")
    ##
    > head(symbolData,10)
       Symbol                                                   Security.Name Market.Category Test.Issue Financial.Status Round.Lot.Size
    1    AAIT iShares MSCI All Country Asia Information Technology Index Fund               G          N                N            100
    2     AAL                    American Airlines Group, Inc. - Common Stock               Q          N                N            100
    3    AAME                    Atlantic American Corporation - Common Stock               G          N                N            100
    4    AAOI                    Applied Optoelectronics, Inc. - Common Stock               G          N                N            100
    5    AAON                                       AAON, Inc. - Common Stock               Q          N                N            100
    6    AAPL                                       Apple Inc. - Common Stock               Q          N                N            100
    7    AAVL                  Avalanche Biotechnologies, Inc. - Common Stock               G          N                N            100
    8    AAWW                     Atlas Air Worldwide Holdings - Common Stock               Q          N                N            100
    9    AAXJ               iShares MSCI All Country Asia ex Japan Index Fund               G          N                N            100
    10   ABAC                        Aoxin Tianli Group, Inc. - Common Shares               S          N                N            100
    

    Edit: As per @GSee's suggestion, a (presumably) more robust way to obtain the source data is with the stockSymbols() function in the package TTR:

    > symbolData2 <- stockSymbols(exchange="NASDAQ")
    Fetching NASDAQ symbols...
    > ##
    > head(symbolData2)
      Symbol                                                           Name LastSale    MarketCap IPOyear         Sector
    1   AAIT iShares MSCI All Country Asia Information Technology Index Fun   34.556      6911200      NA           <NA>
    2    AAL                                  American Airlines Group, Inc.   40.500  29164164453      NA Transportation
    3   AAME                                  Atlantic American Corporation    4.020     83238028      NA        Finance
    4   AAOI                                  Applied Optoelectronics, Inc.   20.510    303653114    2013     Technology
    5   AAON                                                     AAON, Inc.   18.420   1013324613      NA  Capital Goods
    6   AAPL                                                     Apple Inc.  103.300 618546661100    1980     Technology
                             Industry Exchange
    1                            <NA>   NASDAQ
    2   Air Freight/Delivery Services   NASDAQ
    3                  Life Insurance   NASDAQ
    4                  Semiconductors   NASDAQ
    5 Industrial Machinery/Components   NASDAQ
    6          Computer Manufacturing   NASDAQ
    

    I don't know if you just wanted to get ticker symbols from names, but if you are also looking for actual share price information you could do something like this:

    namedStock <- function(name="Microsoft",
                           start=Sys.Date()-365,
                           end=Sys.Date()-1){
      ticker <- symbolData[agrep(name,symbolData[,2]),1]
      getSymbols(
        Symbols=ticker,
        src="yahoo",
        env=.GlobalEnv,
        from=start,to=end)
    }
    ##
    ## an xts object named MSFT will be added to
    ## the global environment, no need to assign
    ## to an object
    namedStock()
    ##
    > str(MSFT)
    An ‘xts’ object on 2013-09-03/2014-08-29 containing:
      Data: num [1:251, 1:6] 31.8 31.4 31.1 31.3 31.2 ...
     - attr(*, "dimnames")=List of 2
      ..$ : NULL
      ..$ : chr [1:6] "MSFT.Open" "MSFT.High" "MSFT.Low" "MSFT.Close" ...
      Indexed by objects of class: [Date] TZ: UTC
      xts Attributes:  
    List of 2
     $ src    : chr "yahoo"
     $ updated: POSIXct[1:1], format: "2014-09-02 21:51:22.792"
    > chartSeries(MSFT)
    

    enter image description here

    So like I said, this isn't the cleanest solution but hopefully it helps you out. Also note that my data source was pulling companies traded on NASDAQ (which is most major companies), but you could easily combine this with other sources.