I have a dataset where column 1 is date-time and column 2 is the price at a specific point in time. This data is downloaded to Excel with bloomberg excel add-in. Then I used read_excel function to import this file to R.
This is how the data looks like in R
Question: the data is supposed to be with 1 min intervals, but it is not always the case. Sometimes the time in the next row is more than 1 min later. So, how can I insert extra rows for the missing minutes? So, for each date I would like to have the following sequence:
08:00
08:01
08:02
...
16:58
16:59
17:00
For these points in time, I would like keep the price from the dataset. If the price is not there, it should add missing. For example if we have:
...
12:31 100
12:32 102
12:35 101
...
then I would like to have:
...
12:31 100
12:32 102
12:33 missing
12:34 missing
12:35 101
...
what is the easiest way to do this? Thank you!
You can create an xts with the prices you have and merge it with a sequence that has a higher frequency (e.g. every minute).
library(xts)
library(lubridate)
set.seed(123)
prices <- 100 + rnorm(16)
timeindex <- seq(ymd_hm('2020-05-28 08:45'),
ymd_hm('2020-05-28 09:15'),
by = '2 mins')
prices_xts <- xts(prices, order.by = timeindex)
> head(prices_xts)
[,1]
2020-05-28 08:45:00 99.43952
2020-05-28 08:47:00 99.76982
2020-05-28 08:49:00 101.55871
2020-05-28 08:51:00 100.07051
2020-05-28 08:53:00 100.12929
2020-05-28 08:55:00 101.71506
timeindex2 <- seq(ymd_hm('2020-05-28 08:45'),
ymd_hm('2020-05-28 09:15'),
by = '1 mins')
prices_with_gaps_xts <- merge.xts(prices_xts,
timeindex2)
> head(prices_with_gaps_xts)
prices_xts
2020-05-28 08:45:00 99.43952
2020-05-28 08:46:00 NA
2020-05-28 08:47:00 99.76982
2020-05-28 08:48:00 NA
2020-05-28 08:49:00 101.55871
2020-05-28 08:50:00 NA