Search code examples
ranalytics

Summarize Gaps in Binary Data using R


I am playing around with binary data.

I have data in columns in the following manner:

       A   B   C   D   E   F   G   H   I   J   K   L   M   N
       -----------------------------------------------------
       1   1   1   1   1   1   1   1   1   0   0   0   0   0
       0   0   0   0   1   1   1   0   1   1   0   0   1   0
       0   0   0   0   0   0   0   1   1   1   1   1   0   0

1 - Indicating that the system was on and 0 indicating that the system was off

I am trying to figure out ways to figure out a way to summarize the gaps between the on/off transition of these systems.

For example, for the first row, it stops working after 'I' for the second row, it works from 'E' to 'G' and then works again in 'I' and 'M' but is off during other.

Is there a way to summarize this?

I wish to see my result in the following form

    row-number    Number of 1's       Range
    ------------  ------------------  ------
    1                    9             A-I
    2                    3             E-G
    2                    2             I-J
    2                    1             M
    3                    5             H-L

Solution

  • Here's a tidyverse solution:

    library(tidyverse)
    df %>%
      rowid_to_column() %>%
      gather(col, val, -rowid) %>%
      group_by(rowid) %>%
      # This counts the number of times a new streak starts
      mutate(grp_num = cumsum(val != lag(val, default = -99))) %>%
      filter(val == 1) %>%
      group_by(rowid, grp_num) %>%
      summarise(num_1s = n(),
                range = paste0(first(col), "-", last(col)))
    
    
    ## A tibble: 5 x 4
    ## Groups:   rowid [3]
    #  rowid grp_num num_1s range
    #  <int>   <int>  <int> <chr>
    #1     1       1      9 A-I  
    #2     2       2      3 E-G  
    #3     2       4      2 I-J  
    #4     2       6      1 M-M  
    #5     3       2      5 H-L