Search code examples
rloopsnested-loops

In a time series, find the longest subsequent period for which a condition is met


i need to search through dataframe, as follows: if grade goes above 50% at time=1, then drops below 50% at time=3, then above at time=4 and below at time=7, then abpve at time 8 and below at time 12, etc. ... then it was above for 2 seconds, then 3 seconds, then 4 seconds, etc. ... so the final result required is max time above 50%, which is 4 seconds in this case (as per data frame section below). so i need thje value of 4 to be assigned to max(maxGradeTimePeriod) simple code for this please? thanks in advance.

this is what i have tried thus far ( 1 of many attempts!):

    maxGradeTimePeriod <- c()
    i <- 1

    while (i <= nrow(df)) {
            if (0.5 <= df$Grade[i]) {
                    p <- i+1
                    k <- p
                    while (k < (nrow(df)-1)) {
                            if (df$Grade[k] < 0.5) {
                                    time <- (df$Time[k-1])-df$Time[i]
                                    print(time)
                                    maxGradeTimePeriod <- append(maxGradeTimePeriod, time)
                            }
                            else {
                                    time <- max(df$Time)-df$Time[i]
                                    maxGradeTimePeriod <- append(maxGradeTimePeriod, time)
                                                                            }
                            k <- k+1
                            }
                    }
                    i <- i+1
            }
            else {
                    i <- i+1
            }
    }

sample data frame:

 time grade
    1   0.5
    2   0.5
    3   0.1
    4   0.5
    5   0.5
    6   0.5
    7   0.1
    8   0.5
    9   0.5
   10   0.5
   11   0.5
   12   0.1
   13   0.5
   14   0.5
   15   0.5
   16   0.1
   17   0.5
   18   0.5
   19   0.1
   20   0.5

Solution

  • Assuming that each row is one time unit, try:

    y <- rle(x$grade >= 0.5)              # Find clusters of values not below 0.5
    max(y$length[which(y$values)])        # Find which TRUE cluster is the largest
    # [1] 4
    

    It could be that the time points do not have even spacing. In that case, try:

    x2 <- rle(x$grade >= 0.5)$length      # Get all clusters
    x3 <- rep(seq_along(x2), x2)          # Make a vector that specifies cluster per value
    time <- c(0, diff(x$time))            # Make a new vector with time differences 
    y <- aggregate(time ~ x3, FUN=sum)    # Aggregate the sum of time per cluster
    max(y$time)                           # Take the max
    # [1] 4