Search code examples
rdplyrrow-number

Enumerate a grouping variable in a tibble


I would like to know how to use row_number or anything else to transform a variable group into a integer

tibble_test <- tibble(A = letters[1:10], group = c("A", "A", "A", "B", "B", "C", "C", "C", "C", "D"))

# to get the enumeration inside each group of 'group'
tibble_test %>% 
  group_by(group) %>% 
  mutate(G1 = row_number())


But I would like to have this output:


# A tibble: 10 x 4
   A     group    G1    G2
   <chr> <chr> <dbl> <dbl>
 1 a     A         1     1
 2 b     A         2     1
 3 c     A         3     1
 4 d     B         1     2
 5 e     B         2     2
 6 f     C         1     3
 7 g     C         2     3
 8 h     C         3     3
 9 i     C         4     3
10 j     D         1     4

My question is: how to get this column G2, I know i could transform the 'group' var into a factor then integer (after the tibble is arranged) but I would like to know if it can be done using a counting.


Solution

  • You just need one more step and include the group indices with group_indices(). Be aware that how your data is arranged/sorted will affect the index.

    library(dplyr)
    
    tibble_test <- tibble(A = letters[1:10], group = c("A", "A", "A", "B", "B", "C", "C", "C", "C", "D"))
    
    # to get the enumeration inside each group of 'group'
    tibble_test %>% 
      group_by(group) %>% 
      mutate(G1 = row_number(),
             G2 = group_indices())
    
    # A tibble: 10 x 4
    # Groups:   group [4]
       A     group    G1    G2
       <chr> <chr> <int> <int>
     1 a     A         1     1
     2 b     A         2     1
     3 c     A         3     1
     4 d     B         1     2
     5 e     B         2     2
     6 f     C         1     3
     7 g     C         2     3
     8 h     C         3     3
     9 i     C         4     3
    10 j     D         1     4