My reproducible data looks like this-
data <- rnorm(16)
timeStamp <- as.POSIXct("2019-03-18 10:30:00") + 0:15*60
Rf <- xts(x = data, order.by = timeStamp)
colnames(Rf) <- "R"
Rf[4:5]$R <- NA
Rf[8:9]$R <- NA
Rf[13:14]$R <- NA
omit.Rf <- na.omit(Rf)
My goal is to label the consecutive series chronologically and the following code works-
diff.omit.Rf <- diff(index(omit.Rf))
diff.omit.Rf <- append(1, diff.omit.Rf)
initNum <- 1
for (i in 1:length(omit.Rf)){
if (diff.omit.Rf[[i]] == 1){
omit.Rf$opNum[i] <- initNum
} else {
initNum <- initNum + 1
omit.Rf$opNum[i] <- initNum
}
}
And I get this output-
R opNum
2019-03-18 10:30:00 0.89262137 1
2019-03-18 10:31:00 0.50428310 1
2019-03-18 10:32:00 -0.00040488 1
2019-03-18 10:35:00 0.10126335 2
2019-03-18 10:36:00 0.48726498 2
2019-03-18 10:39:00 1.05075049 3
2019-03-18 10:40:00 -0.25495699 3
2019-03-18 10:41:00 0.89257782 3
2019-03-18 10:44:00 -1.25474533 4
2019-03-18 10:45:00 0.55393767 4
Unfortunately, when I use the same code to create a function it gives me following warning and do not execute the function.
Error in diff.omit.Rf[[i]] : subscript out of bounds
The code for the function I made-
opTimeFun <- function(dataToDeal){
diff.data <- diff(index(dataToDeal))
diff.data <- append(1, diff.data)
initNum <- 1
for (i in 1:length(dataToDeal)){
if (diff.data[[i]] == 1){
dataToDeal$opNum[i] <- initNum
} else {
initNum <- initNum + 1
dataToDeal$opNum[i] <- initNum
}
}
}
Can someone help to solve this problem? Thank you
Here is a shorter version without a for
loop using diff
and cumsum
to create series.
opTimeFun <- function(temp) {
cumsum(c(TRUE, diff(index(temp)) > 1))
}
omit.Rf$opNum <- opTimeFun(omit.Rf)
omit.Rf
# R opNum
#2019-03-18 10:30:00 -0.1952424 1
#2019-03-18 10:31:00 0.8429390 1
#2019-03-18 10:32:00 -0.2429325 1
#2019-03-18 10:35:00 1.3471985 2
#2019-03-18 10:36:00 -0.7869906 2
#2019-03-18 10:39:00 0.5220991 3
#2019-03-18 10:40:00 -1.9884231 3
#2019-03-18 10:41:00 -1.8417666 3
#2019-03-18 10:44:00 1.5586149 4
#2019-03-18 10:45:00 3.5704500 4
We can break the function step by step to understand how it works.
diff
returns the time difference in minutes.
diff(index(omit.Rf))
#Time differences in mins
#[1] 1 1 3 1 3 1 1 3 1
We compare it with 1 minute and find out values which are greater than 1 minute
diff(index(omit.Rf)) > 1
#[1] FALSE FALSE TRUE FALSE TRUE FALSE FALSE TRUE FALSE
Since diff
returns value which is of length 1 less than the original vector we add a default value TRUE
at the beginning of the vector.
c(TRUE, diff(index(omit.Rf)) > 1)
#[1] TRUE FALSE FALSE TRUE FALSE TRUE FALSE FALSE TRUE FALSE
and now take cumulative sum of this logical vector which would increment at points where the value is greater than 1.
cumsum(c(TRUE, diff(index(omit.Rf)) > 1))
#[1] 1 1 1 2 2 3 3 3 4 4
As far as the original function is concerned, it works properly but we need to explicitly return
the object back from the function. So the below function should work.
opTimeFun <- function(dataToDeal){
diff.data <- diff(index(dataToDeal))
diff.data <- append(1, diff.data)
initNum <- 1
for (i in 1:length(dataToDeal)){
if (diff.data[[i]] == 1){
dataToDeal$opNum[i] <- initNum
} else {
initNum <- initNum + 1
dataToDeal$opNum[i] <- initNum
}
}
return(dataToDeal)
}
opTimeFun(omit.Rf)