Search code examples
rquantmod

How to read data from a csv table in R?


I have a csv file, and I want to extract the each column a as string so I can use it with getSymbols function from quantmod package.

The csv file looks like this:

AEGR,Aegerion Pharmaceuticals Inc
AKS,AK Steel Holding Corp
ALXA‎,Alexza Pharmaceuticals Inc
CCL‎,Carnival Corporation
CECO‎,Career Education Corp
CDXS‎,Codexis Inc

And I use this code to read the file:

data<-read.csv(file='CAPM/allquotes.csv',header=F)
symbols=gettext(data[,1])
symbol.names=gettext(data[,2])
getSymbols(symbols)

I get this error:

Error in download.file(paste(yahoo.URL, "s=", Symbols.name, "&a=", from.m,  : cannot open URL 'http://chart.yahoo.com/table.csv?s=ALXA‎&a=0&b=01&c=2007&d=5&e=16&f=2012&g=d&q=q&y=0&z=ALXA‎&x=.csv'
In addition: Warning message:
In download.file(paste(yahoo.URL, "s=", Symbols.name, "&a=", from.m,  : cannot open: HTTP status was '404 Not Found'

When I enter the symbols one by one it works fine. I've also noticed that when I go to the end of the last line, the margins seem to corrupt. In the image you can see that values of 'symbols', the end of the line is a few more spaces to the right than it should be (you can see that because of the color of the initial parenthesis).

symbols object


Solution

  • Your csv has hidden characters in it -- namely a left-to-right mark. Since you are using RStudio, you can remove it with gsub using "\016" as the value for the pattern argument. Alternatively, instead of removing the hidden character that you don't want, you could only keep the characters that you know you DO want. For example, if your symbols will only have letters and/or numbers you could use something like gsub("[^A-Za-z0-9]", "", data[, 1])

    data <- read.csv(text="AEGR,Aegerion Pharmaceuticals Inc
    AKS,AK Steel Holding Corp
    ALXA‎,Alexza Pharmaceuticals Inc
    CCL‎,Carnival Corporation
    CECO‎,Career Education Corp
    CDXS‎,Codexis Inc", header=FALSE)
    #data[, 1] <- gsub("\016", "", data[, 1]) #this should work in RStudio
    data[, 1] <- gsub("[^A-Za-z0-9]", "", data[, 1]) #but this should work anywhere
    symbols=gettext(data[,1])
    getSymbols(symbols, src='yahoo')
    

    After you read.csv, you can examine the data object to see that something is amiss.

    s <- as.character(data[, 1])
    str(s)
    #chr [1:6] "AEGR" "AKS" "ALXA""| __truncated__ "CCL""| __truncated__ "CECO""| __truncated__ "CDXS""| __truncated__
    str(s[3])
    #chr "ALXA""| __truncated__
    
    charToRaw(s[3])
    #[1] 41 4c 58 41 e2 80 8e
    # Compare what we have to what we think we have
    charToRaw("ALXA")
    #[1] 41 4c 58 41