Search code examples
rxtszoo

Using diff() inside a function creating error and not executing


My reproducible data looks like this-

data <- rnorm(16)
timeStamp <- as.POSIXct("2019-03-18 10:30:00") + 0:15*60
Rf <- xts(x = data, order.by = timeStamp)
colnames(Rf) <- "R"

Rf[4:5]$R <- NA
Rf[8:9]$R <- NA
Rf[13:14]$R <- NA

omit.Rf <- na.omit(Rf)

My goal is to label the consecutive series chronologically and the following code works-

diff.omit.Rf <- diff(index(omit.Rf))
diff.omit.Rf <- append(1, diff.omit.Rf)
initNum <- 1
for (i in 1:length(omit.Rf)){
  if (diff.omit.Rf[[i]] == 1){
    omit.Rf$opNum[i] <- initNum
  } else {
    initNum <- initNum + 1
    omit.Rf$opNum[i] <- initNum
  }
}

And I get this output-

                     R              opNum
2019-03-18 10:30:00  0.89262137     1
2019-03-18 10:31:00  0.50428310     1
2019-03-18 10:32:00 -0.00040488     1
2019-03-18 10:35:00  0.10126335     2
2019-03-18 10:36:00  0.48726498     2
2019-03-18 10:39:00  1.05075049     3
2019-03-18 10:40:00 -0.25495699     3
2019-03-18 10:41:00  0.89257782     3
2019-03-18 10:44:00 -1.25474533     4
2019-03-18 10:45:00  0.55393767     4

Unfortunately, when I use the same code to create a function it gives me following warning and do not execute the function.

Error in diff.omit.Rf[[i]] : subscript out of bounds

The code for the function I made-

opTimeFun <- function(dataToDeal){
  diff.data <- diff(index(dataToDeal))
  diff.data <- append(1, diff.data)
  initNum <- 1
  for (i in 1:length(dataToDeal)){
    if (diff.data[[i]] == 1){
      dataToDeal$opNum[i] <- initNum
    } else {
      initNum <- initNum + 1
      dataToDeal$opNum[i] <- initNum
    }
  }
}

Can someone help to solve this problem? Thank you


Solution

  • Here is a shorter version without a for loop using diff and cumsum to create series.

    opTimeFun <- function(temp) {
       cumsum(c(TRUE, diff(index(temp)) > 1))
    }
    
    omit.Rf$opNum <- opTimeFun(omit.Rf)
    omit.Rf
    
    #                             R opNum
    #2019-03-18 10:30:00 -0.1952424     1
    #2019-03-18 10:31:00  0.8429390     1
    #2019-03-18 10:32:00 -0.2429325     1
    #2019-03-18 10:35:00  1.3471985     2
    #2019-03-18 10:36:00 -0.7869906     2
    #2019-03-18 10:39:00  0.5220991     3
    #2019-03-18 10:40:00 -1.9884231     3
    #2019-03-18 10:41:00 -1.8417666     3
    #2019-03-18 10:44:00  1.5586149     4
    #2019-03-18 10:45:00  3.5704500     4
    

    We can break the function step by step to understand how it works.

    diff returns the time difference in minutes.

    diff(index(omit.Rf))
    #Time differences in mins
    #[1] 1 1 3 1 3 1 1 3 1
    

    We compare it with 1 minute and find out values which are greater than 1 minute

    diff(index(omit.Rf)) > 1
    #[1] FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE
    

    Since diff returns value which is of length 1 less than the original vector we add a default value TRUE at the beginning of the vector.

    c(TRUE, diff(index(omit.Rf)) > 1)
    #[1]  TRUE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE
    

    and now take cumulative sum of this logical vector which would increment at points where the value is greater than 1.

    cumsum(c(TRUE, diff(index(omit.Rf)) > 1))
    #[1] 1 1 1 2 2 3 3 3 4 4
    

    As far as the original function is concerned, it works properly but we need to explicitly return the object back from the function. So the below function should work.

    opTimeFun <- function(dataToDeal){
      diff.data <- diff(index(dataToDeal))
      diff.data <- append(1, diff.data)
      initNum <- 1
      for (i in 1:length(dataToDeal)){
         if (diff.data[[i]] == 1){
            dataToDeal$opNum[i] <- initNum
         } else {
            initNum <- initNum + 1
            dataToDeal$opNum[i] <- initNum
         }
       }
       return(dataToDeal)
     }
    
    opTimeFun(omit.Rf)