Search code examples
rstring-length

Find the maximum and mean length of the consecutive "TRUE"-arguments


I started with a daily time series of wind speeds. I wanted to examine of the mean and maximum number of consecutive days under a certain threshold change between two periods of time. This is how far I've come: I subsetted the data to rows with values beneath the threshold and identified consecutive days.

I now have a data frame that looks like this:

dates   consecutive_days
1970-03-25  NA
1970-04-09  TRUE
1970-04-10  TRUE
1970-04-11  TRUE
1970-04-12  TRUE
1970-04-15  FALSE
1970-05-08  TRUE
1970-05-09  TRUE
1970-05-13  FALSE

What I want to do next is to find the maximum and mean length of the consecutive "TRUE"-arguments. (which in this case would be: maximum=4; mean=3).


Solution

  • Here is one method using rle:

    # construct sample data.frame:
    set.seed(1234)
    df <- data.frame(days=1:12, consec=sample(c(TRUE, FALSE), 12, replace=T))
    
    # get rle object
    consec <- rle(df$consec)
    
    # max consecutive values
    max(consec$lengths[consec$values==TRUE])
    # mean consecutive values
    mean(consec$lengths[consec$values==TRUE])
    

    Quoting from ?rle, rle

    Compute[s] the lengths and values of runs of equal values in a vector

    We save the results and then subset to consecutive TRUE observations to calculate the mean and max.

    You could easily combine this into a function, or simply concatenate the results above:

    myResults <- c("max"=max(consec$lengths[consec$values==TRUE]), 
                   "mean"= mean(consec$lengths[consec$values==TRUE]))