Search code examples
rdplyrgrouping

How to count the cumulative number of subgroupings using dplyr?


I'm trying to run the number of cumulative subgroupings using dplyr, as illustrated and explanation in the image below. I am trying to solve for Flag2 in the image. Any recommendations for how to do this?

Beneath the image I also have the reproducible code that runs all columns up through Flag1 which works fine.

enter image description here

Reproducible code:

library(dplyr)

myData <- 
  data.frame(
    Element = c("A","B","B","B","B","B","A","C","C","C","C","C"),
    Group = c(0,0,1,1,2,2,0,3,3,0,0,0)
  )


excelCopy <- myData %>% 
  group_by(Element) %>% 
  mutate(Element_Count = row_number()) %>% 
  mutate(Flag1 = case_when(Group > 0 ~ match(Group, unique(Group)),TRUE ~ Element_Count)) %>%
  ungroup()
  
print.data.frame(excelCopy)

Solution

  • Using row_number and setting 0 values to NA

    library(dplyr)
    
    excelCopy |> 
      group_by(Element, Group) |> 
      mutate(Flag2 = ifelse(Group == 0, NA, row_number()))
    
     Element Group Element_Count Flag1 Flag2
       <chr>   <dbl>         <int> <int> <int>
     1 A           0             1     1    NA
     2 B           0             1     1    NA
     3 B           1             2     2     1
     4 B           1             3     2     2
     5 B           2             4     3     1
     6 B           2             5     3     2
     7 A           0             2     2    NA
     8 C           3             1     1     1
     9 C           3             2     1     2
    10 C           0             3     3    NA
    11 C           0             4     4    NA
    12 C           0             5     5    NA