Search code examples
rsapplyrle

Find if certain value appears more than n-times subsequently [R]


I have a list of vectors, for instance:

vec1 <- c(rep(0,5), 1, rep(0,11), rep(1,4), rep(0,6))
vec2 <- c(rep(0,11), 1, rep(0,18))
vec3 <- c(rep(0,3), rep(1,5), rep(0,21))
vec4 <- c(rep(0,23))
  
test_list <- list(vec1, vec2, vec3, vec4)

I would like to filter this list based on 2 conditions:

  1. 1 is present within the vector.
  2. 1 appears consecutively (in a row) more than 3 times.

My output should contain vec1 and vec3.

I wrote a following function:

filter_ones <- test_list[sapply(test_list,function(vec) 1 %in% vec )]

And it returns vec1, vec2, and vec3.

How to apply the second condition? I probably shall use rle() but have no idea, how to do so. I will be grateful for help.


Solution

  • We could add a second condition using rle short-circuiting with the OP's first logical expression (1 %in% vec) in Filter to filter the elements of the list.

    The rle on the logical converted binary values is converted to a second logical based on whether the lengths (from rle) is greater than threshold 'n' and it is a 1 (TRUE), wrap with any to return a single TRUE/FALSE

    n <- 3
    Filter(function(x) 1 %in% x && any(with(rle(as.logical(x)), 
          lengths > n & values)), test_list)
    

    -output

    [[1]]
     [1] 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0
    
    [[2]]
     [1] 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    

    Or using the OP's sapply

    test_list[sapply(test_list,function(vec) 1 %in% vec && 
          any(with(rle(as.logical(vec)), 
          lengths > n & values)))]
    [[1]]
     [1] 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0
    
    [[2]]
     [1] 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0