Search code examples
rintervalsdata-munging

split vector or data.frame into intervals by condition and print interval's first and last value


I have data.frame which looks like this:

v1 <- c(1:10)
v2 <- c(FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE)
dfb <- data.frame(v1, v2)

> dfb
   v1    v2
1   1 FALSE
2   2 FALSE
3   3  TRUE
4   4 FALSE
5   5 FALSE
6   6 FALSE
7   7  TRUE
8   8 FALSE
9   9 FALSE
10 10 FALSE

I need those operations:

  1. split data.frame into intervals according to V2 if is TRUE
  2. rows where V2 is TRUE will be last interval element
  3. if the last element is not TRUE it will be treated as if is (this can be easily achieved by adding TRUE to last vector position)
  4. print V1 as first and last element from created intervals

after this operations my results should look like this:

  > df_final
   Vx Vy
    1 3
    4 7
    8 10

I've tried cumsum on v2 vector but TRUE values are treated as first interval element not last

> split(v2, cumsum(v2==TRUE))
$`0`
[1] FALSE FALSE

$`1`
[1]  TRUE FALSE FALSE FALSE

$`2`
[1]  TRUE FALSE FALSE FALSE

Solution

  • You can still use cumsum, you just have to slightly adjust v2:

    v3 <- c(TRUE,v2[-length(v2)])
    v3
     [1]  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE
    
    res <- split(v2,cumsum(v3))
    res[[length(res)]][length(last(res))] <- T
    res
    $`1`
    [1] FALSE FALSE  TRUE
    
    $`2`
    [1] FALSE FALSE FALSE  TRUE
    
    $`3`
    [1] FALSE FALSE  TRUE
    
    df_final <- data.frame(Vx=which(v3),Vy=which(unlist(res,use.names=F)))
    df_final
      Vx Vy
    1  1  3
    2  4  7
    3  8 10