Search code examples
rstockquotes

Data Manipulation in R, Stock Data Transformation


I would like to create a following data.frame from classic "time * OHLC" data format of exchange data.

Starting point is following data frame:

      Date   Time    Open    High     Low   Close
01/28/2002   0833  543.81  543.82  543.84  543.85
01/28/2002   0850  542.95  542.95  542.95  542.95
01/28/2002   0901  542.45  542.45  542.45  542.45
01/28/2002   0911  542.45  542.45  542.45  542.45

There is 1534129 rows in my table. I am a little bit desperate with moving data into the following structure:

      Date  Time   Price
01/28/2002  0833  543.81
01/28/2002  0833  543.82
01/28/2002  0833  543.84
01/28/2002  0833  543.85
01/28/2002  0850  542.95

That is the way how first line should be rewritten and this extension should be repeated on every line of the original file. A second part of my task is to set a parameter (distribution) which is going to decide whether High Or Low comes first during the bar creation phase. This of course has further implication on data manipulation later, but I can't get to the starting point yet.

Later, I would like to work with code and decide how data look like when you chose always high, then low is created first (and opposite), or the hardest thing, because its not done deterministically, is version with distribution that will decide which goes first.

Hopefully it describes the task (question) exactly. I am glad for every tip, or idea. Thank you for help.


Solution

  • Try

     library(tidyr)
     library(dplyr)
     df1 <- df %>%
                 gather(Var, Price, Open:Close) %>%
                 arrange(Date, Time) %>%
                 select(-Var)
    
    
     head(df1)
     #        Date Time  Price
     #1 01/28/2002  0833 543.81
     #2 01/28/2002  0833 543.82
     #3 01/28/2002  0833 543.84
     #4 01/28/2002  0833 543.85
     #5 01/28/2002  0850 542.95
     #6 01/28/2002  0850 542.95
    

    data

     df <-structure(list(Date = c("01/28/2002", "01/28/2002", "01/28/2002", 
     "01/28/2002"), Time = c("0833", "0850", "0901", "0911"), Open = c(543.81, 
     542.95, 542.45, 542.45), High = c(543.82, 542.95, 542.45, 542.45
     ), Low = c(543.84, 542.95, 542.45, 542.45), Close = c(543.85, 
     542.95, 542.45, 542.45)), .Names = c("Date", "Time", "Open", 
    "High", "Low", "Close"), row.names = c(NA, -4L), class = "data.frame")