Search code examples
rrangebinning

R - Identifying ranges of fixed width by value count


Assume an ordered set of 100 binary values. Using a window size of 10, I would like to know the ranges (i.e., start and end position) of those windows that contain at least x "1s" (where x=3, for example).

> set.seed(123456789)
> full=rep(0,100)
> full[sample(1:100, 15)]=1
> split(full, ceiling(seq_along(full)/10))
$`1`
 [1] 0 0 0 0 0 1 0 0 0 0

$`2`
 [1] 0 0 1 0 0 0 0 0 0 0

$`3`
 [1] 0 0 1 0 1 0 0 0 0 0

$`4`
 [1] 0 0 0 0 0 0 0 1 0 0

$`5`
 [1] 0 1 0 0 0 0 0 0 1 0

$`6`
 [1] 0 0 0 0 0 0 0 0 0 0

$`7`
 [1] 0 0 0 0 1 0 1 0 0 1

$`8`
 [1] 0 0 0 0 0 1 0 0 0 0

$`9`
 [1] 0 0 0 0 0 1 1 0 1 0

$`10`
 [1] 0 0 0 0 0 0 0 0 0 1

Here's what I am looking for:

> desired_function(full)
61-70
81-90  

Solution

  • An option would be to do a rolling apply function or (rollsum) with width 10, check if there are 3 1s (binary data), get the position of logical vector with which, convert it to buckets using cut and get the unique values of the bucket

    library(zoo)
    unique(cut(which(rollapply(full, 10, function(x) sum(x) == 3)), 
      breaks = c(-Inf, 11, 20, 31, 40, 51, 60), 
          labels = c('11-20', '21-30', '31-40', '41-50', '51-60', '61-70')))