I have data for commodity price as you see in the table below:
Date Time Price
19990104 14:11:14.34 220
19990104 14:11:21.21 200
19990104 14:11:36.35 221
19990104 14:11:45.45 202
19990104 14:11:56.11 215
You see the time is 14h 11m and x seconds, with.xx milliseconds. I m trying to find the first value, the last value, the highest and the lowest value given in a minute. I have data for thousands of days and minutes, above is just an extract.
As a result I want to create a row with all the information. For the table above the result should be:
Date Time Start End Low High
19990104 14:11:00 220 215 200 221
Any help is appreciated. Thank you!
Up front, it might be a better path to convert your Date
and Time
fields into a single POSIXt
-class object. This would be a good method if you have need for Date+Time to be a numeric-like field at some point (e.g., plotting something over time). It's not required, but in my experience I almost always have need to treat time numerically (and date usually needs to be there too).
If you don't want/need to change to POSIXt
or Time
class, you can do the below. (I added a couple of data rows in order to show multiple summary rows.)
dat$min <- substr(dat$Time, 1, 5)
aggregate(dat$Price, dat[,c("Date","min")], function(Price) c(Start=Price[1], End=Price[length(Price)], Low=min(Price), High=max(Price)))
# Date min x.Start x.End x.Low x.High
# 1 19990104 14:11 220 215 200 221
# 2 19990104 14:12 229 209 209 229
library(dplyr)
dat %>%
arrange(Date, Time) %>%
group_by(Date, min = substr(dat$Time, 1, 5)) %>%
summarize(Time = min(Time), Start = first(Price), End = last(Price), Low = min(Price), High = max(Price)) %>%
ungroup() %>%
select(-min)
# # A tibble: 2 x 6
# Date Time Start End Low High
# <int> <chr> <int> <int> <int> <int>
# 1 19990104 14:11:14 220 215 200 221
# 2 19990104 14:12:14 229 209 209 229
Data
dat <- structure(list(Date = c(19990104L, 19990104L, 19990104L, 19990104L, 19990104L, 19990104L, 19990104L), Time = c("14:11:14", "14:11:21", "14:11:36", "14:11:45", "14:11:56", "14:12:14", "14:12:21"), Price = c(220L, 200L, 221L, 202L, 215L, 229L, 209L)), class = "data.frame", row.names = c(NA, -7L))