Search code examples
rfinancequantmodquantitative-finance

Format quantmod GetSymbol data so that multiple Symbols are in one data frame


I have some stock data that is formatted like this (got it from another source):

 Company,Date,Close
 AXP,2011-06-03,49.28
 AXP,2011-06-04,49.28
 CAT,2011-06-03,101.1
 CAT,2011-06-04,101.1

What I want to do is add the volume to that. If you download stock data using the GetSymbols command, volume is part of it.

The problem is that data obtained using the GetSymbols command is structured completely differently. See:

GetSymbols

Would it somehow be possible to change the format so that I have the data in the above format? I want to the outcome table to look like this:

 Company,Date,Close,Volume
 AXP,2011-06-03,49.28,5000000
 AXP,2011-06-04,49.28,3500000
 CAT,2011-06-03,101.1,1000000
 CAT,2011-06-04,101.1,3000000

Thanks in advance!!


Solution

  • One way you can do it:

    • create vector of tickers
    • initiate a new empty dataframe with column names corresponding with original data frame

    loop over ticker symbols:

    • fortify the xts created by getSymbols and extract the volume column and add a new Company column
    • rbind tickers
    • merge original dataframe with new dataframe by Company,Date columns

    The code snippet assumes that your imported Data is a dataframe object and named dfOrig

    tickers <- c('AXP','CAT')
    getSymbols(c("AXP","CAT"), from = '2018-03-01')
    dfNew <- data.frame(Company = character(),
                     Date=as.Date(character()),
                     Close=numeric(),
                     Volume=numeric(), 
                     stringsAsFactors=FALSE) 
    for(i in 1:length(tickers)){
      dfSym <- cbind(Company=rep(tickers[i],nrow(get(tickers[i]))),fortify.zoo(get(tickers[i])[,5]))
      names(dfSym)[2:3] <- c('Date','Volume')
      dfNew <- rbind(dfNew,dfSym)
    }
    

    The first few lines from dfNew:

    > head(dfNew)
      Company       Date  Volume
    1     AXP 2018-03-01 4369800
    2     AXP 2018-03-02 3584700
    3     AXP 2018-03-05 2661400
    4     AXP 2018-03-06 3221100
    5     AXP 2018-03-07 3594900
    6     AXP 2018-03-08 3760500
    

    Now you can merge the dataframes by Company andDate.

    > merge(dfOrig,dfNew,by = intersect(names(dfOrig), names(dfNew)))
       Company       Date  Close   Volume
    1      AXP 2018-03-01  95.14  4369800
    2      AXP 2018-03-02  95.60  3584700
    3      AXP 2018-03-05  96.66  2661400
    4      AXP 2018-03-06  96.07  3221100
    5      AXP 2018-03-07  95.64  3594900
    6      AXP 2018-03-08  96.59  3760500
    7      AXP 2018-03-09  99.05  3044300
    8      AXP 2018-03-12  97.67  4020900
    9      AXP 2018-03-13  95.37  4858700
    10     AXP 2018-03-14  94.53  3013800
    11     AXP 2018-03-15  94.39  3213000
    12     AXP 2018-03-16  95.61  5124900
    13     AXP 2018-03-19  94.84  2610400
    14     AXP 2018-03-20  94.95  2501600
    15     AXP 2018-03-21  94.74  2124000
    16     AXP 2018-03-22  91.41  4113900
    17     AXP 2018-03-23  90.45  5100400
    18     AXP 2018-03-26  92.62  2769800
    19     AXP 2018-03-29  93.28  4914600
    20     CAT 2018-03-01 150.23  9434800
    21     CAT 2018-03-02 146.38  9509400
    22     CAT 2018-03-05 151.12  8453000
    23     CAT 2018-03-06 153.75  5795600
    24     CAT 2018-03-07 151.51  5491000
    25     CAT 2018-03-08 153.59  4105000
    26     CAT 2018-03-09 158.25  4576700
    27     CAT 2018-03-12 154.50  4880400
    28     CAT 2018-03-13 153.69  4819100
    29     CAT 2018-03-14 152.54  4873600
    30     CAT 2018-03-15 154.57  5268800
    31     CAT 2018-03-16 156.46 10897600
    32     CAT 2018-03-19 152.15  5479200
    33     CAT 2018-03-20 154.06  3557600
    34     CAT 2018-03-21 155.80  3887400
    35     CAT 2018-03-22 146.90  8225600
    36     CAT 2018-03-23 144.29  6533100
    37     CAT 2018-03-26 149.19  5454900
    38     CAT 2018-03-27 146.99  4915300
    39     CAT 2018-03-28 145.16  4933900
    40     CAT 2018-03-29 147.38  4179000