Search code examples
rdplyrgrouping

How to count groupings of elements in base R or dplyr using multiple conditions?


I am trying to count the number of elements by groupings, subject to the condition that each grouping code ("Group") is > 0. Suppose we start with the below output DF generated via the code immediately beneath:

   Element Group reSeq
   <chr>   <dbl> <int>
 1 R           0     1
 2 R           0     1
 3 X           0     1
 4 X           1     2
 5 X           1     2
 6 X           0     1
 7 X           0     1
 8 X           0     1
 9 B           0     1
10 R           0     1
11 R           2     2
12 R           2     2
13 X           3     3
14 X           3     3
15 X           3     3

library(dplyr)

myDF <- data.frame(
  Element = c("R","R","X","X","X","X","X","X","B","R","R","R","X","X","X"),
  Group = c(0,0,0,1,1,0,0,0,0,0,2,2,3,3,3)
)

myDF %>% group_by(Element) %>% mutate(reSeq = match(Group, unique(Group)))

Instead, I would like the reSeq column to calculate and output as shown below with explanations to the right:

   Element Group reSeq reSeq explanation
   <chr>   <dbl> <int>
 1 R           0     1  1st instance of R (ungrouped)(Group = 0 means not grouped)
 2 R           0     2  2nd instance of R (ungrouped)(Group = 0 means not grouped)
 3 X           0     1  1st instance of X (ungrouped)(Group = 0 means not grouped)
 4 X           1     2  2nd instance of X (grouped by Group = 1)
 5 X           1     2  2nd instance of X (grouped by Group = 1)
 6 X           0     3  3rd instance of X (ungrouped)
 7 X           0     4  4th instance of X (ungrouped)
 8 X           0     5  5th instance of X (ungrouped)
 9 B           0     1  1st instance of B (ungrouped)
10 R           0     3  3rd instance of R (ungrouped)
11 R           2     4  4th instance of R (grouped by Group = 2)
12 R           2     4  4th instance of R (grouped by Group = 2)
13 X           3     6  6th instance of X (grouped by Group = 3)
14 X           3     6  6th instance of X (grouped by Group = 3)
15 X           3     6  6th instance of X (grouped by Group = 3)

Any recommendations for doing this? If possible, starting with the dplyr code I use above because I am fairly familiar with it.


Solution

  • If we use rowid from data.table, can skip a couple of steps

    library(dplyr)
    library(data.table)
    library(tidyr)
     myDF %>% 
      mutate(reSeq =  rowid(Element) * NA^!(Group == 0 |!duplicated(Group))) %>% 
      group_by(Element) %>% 
      fill(reSeq) %>%
      mutate(reSeq = match(reSeq, unique(reSeq))) %>%
      ungroup
    

    -output

    # A tibble: 15 × 3
       Element Group reSeq
       <chr>   <dbl> <int>
     1 R           0     1
     2 R           0     2
     3 X           0     1
     4 X           1     2
     5 X           1     2
     6 X           0     3
     7 X           0     4
     8 X           0     5
     9 B           0     1
    10 R           0     3
    11 R           2     4
    12 R           2     4
    13 X           3     6
    14 X           3     6
    15 X           3     6