Search code examples
rfilterextract

How to grab numerical values form certain block of values in R


I have a df dataframe as below.

df <- data.frame(id = c(1,2,3,4,5,6,7,8,9,10),
                 X1 = c(1,2,3,4,2,2,3,4,4,4),
                 X2 = c(1.23,1.23,1.23,1.23,1.23,1.23,1.23,1.23,1.23,1.23),
                 X3 = c(0,0,0,0,0,0,0,0,0,0),
                 X4 = c(1,1,1,1,1,1,1,1,1,1),
                 X5 = c(0,0,0,0,0,0,0,0,0,0),
                 X6 = c(0,0,0,0,0,0,0,0,0,0),
                 X7 = c(0,0,0,0,0,0,0,0,0,0),
                 X8 = c(1,1,0,2,8,0,1,3,4,5),
                 X9 = c(5,2,3,5,7,6,1,3,1,1),
                 X10 = c(1,2,1,4,0,6,7,5,5,6),
                 X11 = c(4,0,6,7,8,0,0,0,7,6),
                 X12 = c(0,1,0,0,0,6,5,4,0,0),
                 X13 = c(1,0,3,4,3,2,1,7,8,7),
                 X14 = c(1,2,NA,4,5,7,8,NA,8,5),
                 X15 = c(2,6,NA,6,5,NA,3,NA,NA,3))

> df
   id X1   X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15
1   1  1 1.23  0  1  0  0  0  1  5   1   4   0   1   1   2
2   2  2 1.23  0  1  0  0  0  1  2   2   0   1   0   2   6
3   3  3 1.23  0  1  0  0  0  0  3   1   6   0   3  NA  NA
4   4  4 1.23  0  1  0  0  0  2  5   4   7   0   4   4   6
5   5  2 1.23  0  1  0  0  0  8  7   0   8   0   3   5   5
6   6  2 1.23  0  1  0  0  0  0  6   6   0   6   2   7  NA
7   7  3 1.23  0  1  0  0  0  1  1   7   0   5   1   8   3
8   8  4 1.23  0  1  0  0  0  3  3   5   0   4   7  NA  NA
9   9  4 1.23  0  1  0  0  0  4  1   5   7   0   8   8  NA
10 10  4 1.23  0  1  0  0  0  5  1   6   6   0   7   5   3

For each id, starting from column X5 to until X7 or X8, there is a block of 3 or 4 0s. After this block, I need to grab the numerical values until the next 0.

My desired output would be:

> df_new
   id Gamma1 Gamma2 Gamma3 Gamma4
1   1      1      5      1      4
2   2      1      2      2     NA
3   3      3      1      6     NA
4   4      2      5      4      7
5   5      8      7     NA     NA
6   6      6      6     NA     NA
7   7      1      1      7     NA
8   8      3      3      5     NA
9   9      4      1      5      7
10 10      5      1      6      6

Solution

  • Given that the condition always starts at column "X5" (column 6), using pivot_longer with a consecutive pivot_wider to construct the desired data frame.

    Explanation: With condition == 0 starting at the desired column, a subsequent consecutive_id always gives the expected sequence of numbers on id 3.

    library(dplyr)
    library(tidyr)
    
    df %>% 
      pivot_longer(-id) %>% 
      mutate(cons_id = row_number() > 4 & value == 0, 
             cons_id = consecutive_id(cons_id) == 3, .by = id) %>% 
      filter(cons_id) %>% 
      mutate(name = paste0("Gamma", row_number()), .by = id) %>% 
      pivot_wider(id_cols=id)
    # A tibble: 10 × 5
          id Gamma1 Gamma2 Gamma3 Gamma4
       <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
     1     1      1      5      1      4
     2     2      1      2      2     NA
     3     3      3      1      6     NA
     4     4      2      5      4      7
     5     5      8      7     NA     NA
     6     6      6      6     NA     NA
     7     7      1      1      7     NA
     8     8      3      3      5     NA
     9     9      4      1      5      7
    10    10      5      1      6      6