Search code examples
rtime-seriesxtsquantitative-financehft

Select a range of 5 mins by date and time using R


I have a time series data of format

                        Ask    Bid  Trade Ask_Size Bid_Size Trade_Size
2016-11-01 01:00:03     NA 938.10     NA       NA      203         NA
2016-11-01 01:00:04     NA 937.20     NA       NA      100         NA
2016-11-01 01:00:04 938.00     NA     NA       28       NA         NA
2016-11-01 01:00:04     NA 938.10     NA       NA      203         NA
2016-11-01 01:00:04 939.00     NA     NA       11       NA         NA
2016-11-01 01:00:05     NA 938.15     NA       NA       19         NA
2016-11-01 01:00:06     NA 937.20     NA       NA      100         NA
2016-11-01 01:00:06 938.00     NA     NA       28       NA         NA
2016-11-01 01:00:06     NA     NA 938.10       NA       NA         69
2016-11-01 01:00:06     NA     NA 938.10       NA       NA        831
2016-11-01 01:00:06     NA 938.10     NA       NA      134         NA

The structure of the time series data is

str(df_ts)

An ‘xts’ object on 2016-11-01 01:00:03/2016-11-02 12:59:37 containing:
  Data: num [1:35797, 1:6] NA NA 938 NA 939 NA NA 938 NA NA ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:6] "Ask" "Bid" "Trade" "Ask_Size" ...
  Indexed by objects of class: [POSIXct,POSIXt] TZ: 
  xts Attributes:  
 NULL

How do I create a subset of the time series data of 5 mins. The start time and end time will be user defined

The sample data can be found at

https://www.dropbox.com/s/m94y6pbhjlkny1l/Sample_HFT.csv?dl=0

Please help


Solution

  • You can use lubridate and apply functions. I am assuming your timestamp (date and time) are in the first column, and I have names that column as "timestamp". The data frame is df. Install the lubridate package first. The result will be stored in a different data frame df2.

    library(lubridate)
    
    # Round to 5 minutes
    df$timestamp <- ceiling_date(as.POSIXct(df$timestamp), unit = "5 minutes")
    
    # Create data frame to store results
    df2 <- NULL
    df2$timestamp <- levels(factor(df$timestamp))
    df2 <- apply(df[,2:ncol(df)], 2, function(x)
                 {
                  df2 <<- cbind(df2, aggregate(x ~ df$timestamp, FUN = sum)[2])[[ncol(df)-2]]
                 })
    names(df2) <- names(df)