Search code examples
rtime-seriessequences

Function with cbind returning NA in specific columns


I am using a function which identifies a sequence and then calculates the duration of the sequence in minutes. When I cbind the results with data at the final stage, the duration is returned, yet neighboring columns are returned with 'NA' rather than the values originally in those columns

d<-read.table(text='Date.Time Aerial
794  "2012-10-01 08:18:00"      1
795  "2012-10-01 08:34:00"      1
796  "2012-10-01 08:39:00"      1
797  "2012-10-01 08:42:00"      1
798  "2012-10-01 08:48:00"      1
799  "2012-10-01 08:54:00"      1
800  "2012-10-01 08:58:00"      1
801  "2012-10-01 09:04:00"      1
802  "2012-10-01 09:05:00"      1
803  "2012-10-01 09:11:00"      1
1576 "2012-10-01 09:17:00"      2
1577 "2012-10-01 09:18:00"      2
804  "2012-10-01 09:19:00"      1
805  "2012-10-01 09:20:00"      1
1580 "2012-10-01 09:21:00"      2
1581 "2012-10-01 09:23:00"      2
806  "2012-10-01 09:25:00"      1
807  "2012-10-01 09:32:00"      1
808  "2012-10-01 09:37:00"      1
809  "2012-10-01 09:43:00"      1', header=TRUE, stringsAsFactors=FALSE, row.names=1)
#Give correct data type
d$Aerial<- as.numeric(d$Aerial)
d$Date.Time<- as.POSIXct(d$Date.Time)

Function (identify sequence where aerial 2 is repeated and the duration of the sequence):

fun1 <- function(data,aerial){
  data_above <- 1L*(data$Aerial == aerial)
  id_start <- paste(data$Date.Time[which(diff(c(0L,data_above))==1)])
  id_end <- paste(data$Date.Time[which(diff(c(data_above,0L))== -1)])
  res <- cbind(data[id_start,1:1],Duration=difftime(id_end,id_start, units='mins'))
  return(res)
}
fun1(d,2)

Returns:

        Duration
[1,] NA        1
[2,] NA        2

The duration is correct, however I would like it to return the data which should be in the associated columns:

     Date.Time                     Duration
[1,] 2012-10-01 09:11:00            1
[2,] 2012-10-01 09:21:00            2

My actual data.frame has many columns rather than just Date.Time and it still returns NA for all of these


Solution

  • I'd do it like this:

    fun1 <- function(data,aerial) {
        data_above <- 1L * (data$Aerial == aerial)
        id_start <- data$Date.Time[which(diff(c(0L,data_above)) == 1)]
        id_end <- data$Date.Time[which(diff(c(data_above, 0L)) == -1)]
        res <- cbind(data[data$Date.Time %in% id_start, 1, drop=FALSE], 
                      Duration = difftime(id_end,id_start, units='mins'))
       return(res)
    }
    fun1(d,2)
    
    #                Date.Time Duration
    # 1576 2012-10-01 09:17:00   1 mins
    # 1580 2012-10-01 09:21:00   2 mins
    

    Points to note here:

    • when you subset a data.frame and it returns just 1 element, then doing df[, 1] will result in a vector. It's safe to use df[, 1, drop = FALSE.

    • Passing non-data.frame arguments (meaning, all arguments to cbind are not data.frames) will result in the output being a matrix. It expects at least one argument to be data.frame. So, if you don't use drop = FALSE and the output results in 1 row, then it'll be a vector and the result will be a matrix (see the first point)

    • I don't think you need to use paste here for id_start and id_end.

    • The first argument to cbind where you access the data.frame data is not right. You have to query for id_start within all values of Date.Time. It's can be done using %in% as shown.

    Hope this helps.