Search code examples
rdataframecalculated-columnscalculation

How to calculate and extract prices from timestamps in R


I have data for commodity price as you see in the table below:

Date      Time         Price
19990104  14:11:14.34  220 
19990104  14:11:21.21  200 
19990104  14:11:36.35  221  
19990104  14:11:45.45  202  
19990104  14:11:56.11  215  

You see the time is 14h 11m and x seconds, with.xx milliseconds. I m trying to find the first value, the last value, the highest and the lowest value given in a minute. I have data for thousands of days and minutes, above is just an extract.

As a result I want to create a row with all the information. For the table above the result should be:

Date     Time      Start End  Low High
19990104 14:11:00  220   215  200 221   

Any help is appreciated. Thank you!


Solution

  • Up front, it might be a better path to convert your Date and Time fields into a single POSIXt-class object. This would be a good method if you have need for Date+Time to be a numeric-like field at some point (e.g., plotting something over time). It's not required, but in my experience I almost always have need to treat time numerically (and date usually needs to be there too).

    If you don't want/need to change to POSIXt or Time class, you can do the below. (I added a couple of data rows in order to show multiple summary rows.)

    base R

    dat$min <- substr(dat$Time, 1, 5)
    aggregate(dat$Price, dat[,c("Date","min")], function(Price) c(Start=Price[1], End=Price[length(Price)], Low=min(Price), High=max(Price)))
    #       Date   min x.Start x.End x.Low x.High
    # 1 19990104 14:11     220   215   200    221
    # 2 19990104 14:12     229   209   209    229
    

    dplyr

    library(dplyr)
    dat %>%
      arrange(Date, Time) %>%
      group_by(Date, min = substr(dat$Time, 1, 5)) %>%
      summarize(Time = min(Time), Start = first(Price), End = last(Price), Low = min(Price), High = max(Price)) %>%
      ungroup() %>%
      select(-min)
    # # A tibble: 2 x 6
    #       Date Time     Start   End   Low  High
    #      <int> <chr>    <int> <int> <int> <int>
    # 1 19990104 14:11:14   220   215   200   221
    # 2 19990104 14:12:14   229   209   209   229
    

    Data

    dat <- structure(list(Date = c(19990104L, 19990104L, 19990104L, 19990104L, 19990104L, 19990104L, 19990104L), Time = c("14:11:14", "14:11:21", "14:11:36", "14:11:45", "14:11:56", "14:12:14", "14:12:21"),     Price = c(220L, 200L, 221L, 202L, 215L, 229L, 209L)), class = "data.frame", row.names = c(NA, -7L))