Search code examples
rtime-seriesxtszooposixct

Trouble with converting csv file into xts using Highfrequency package in R


I'm using the convert function in Highfrequency package in R. The dataset I'm using is TAQ downloaded from WRDS. The data looks like This. The function convert suppose to convert the .csv into .RData files of xts objects.

I follow the instruction of the package and use the following code:

library(highfrequency)

from <- "2017-01-05"
to <- "2017-01-05"
format <- "%Y%m%d %H:%M:%S"

datasource <- "C:/Users/feimo/OneDrive/SFU/Thesis-Project/R/IBM"
datadestination <- "C:/Users/feimo/OneDrive/SFU/Thesis-Project/R/IBM"

convert( from=from, to=to, datasource=datasource,
         datadestination=datadestination, trades = T, quotes = F,
         ticker="IBM", dir = T, extension = "csv",
         header = F, tradecolnames = NULL,
         format=format, onefile = T )

But I got the following error message:

> Error in `$<-.data.frame`(`*tmp*`, "COND", value = numeric(0)) :   
> replacement has 0 rows, data has 23855

I believe the default column names in the function is: c("SYMBOL", "DATE", "EX", "TIME", "PRICE", "SIZE", "COND", "CORR", "G127") which is different from my dataset, so I manually changed it in my .csv to match it. Then I got another error

>Error in xts(tdata, order.by = tdobject) : 'order.by' cannot contain 'NA', 'NaN', or 'Inf'

Tried to look at the original code, but couldn't find a solution.

Any suggestion would be really helpful. Thanks!


Solution

  • When I run your code on the data to which you provide a link, I get the second error you mention:

    Error in xts(tdata, order.by = tdobject) : 
      'order.by' cannot contain 'NA', 'NaN', or 'Inf'
    

    This error can be traced to these lines in the function highfrequency:::makeXtsTrades(), which is called by highfrequency::convert():

    tdobject = as.POSIXct(paste(as.vector(tdata$DATE), as.vector(tdata$TIME)), 
            format = format, tz = "GMT")
    tdata = xts(tdata, order.by = tdobject)
    

    The error results from two problems:

    1. The variable "DATE" in your data file is read into R as numeric, whereas it appears that the code creating tdobject expects tdata$DATE to be a character vector. You could fix this by manually converting that variable to a character vector:

      tdata <- read.csv("IBM_trades.csv")
      tdata$DATE <- as.character(tdata$DATE)
      write.csv(tdata, file = "IBM_trades_DATE_fixed.csv", row.names = FALSE)
      
    2. The variable "TIME_M" in your data file is not a time of the format "%H:%M:%S". It looks like it is only the minutes and seconds component of a more complete time variable, because values only contain one colon and the values before and after the colon vary from 0 to 59.9. Fixing this problem would require finding the hour component of the time variable.

    These two problems result in tdobject being filled with NA values rather than valid date-times, which causes an error when xts::xts() tries to order the data by tdobject.

    The more general issue seems to be that the function highfrequency::convert() expects your data to follow something like the format described here on the WRDS website, but your data has slightly different column names and possibly different value formats. I would recommend taking a close look at that WRDS page and the documentation for your data file and determining which variables in your data correspond to those described on that page (for instance, it's not clear to me that your data contains any variable that is equivalent to "G127").