Search code examples
rgroup-byidentifiermutate

Creating row identifier within a group


I have data in the following form. There are more than a million rows. I just want to create another column which helps me in row identification of grouped Item3 . The first two columns are irrelevant. Just added to let know that I have other columns in the dataset. I used cumsum and group_indices but didn't work.

Item1 Item2 Item3
One Two A
One Two A
One Two A
One Two B
One Two B
One Two C
Item1 Item2 Item3 Identifier
One Two A 1
One Two A 2
One Two A 3
One Two B 1
One Two B 2
One Two C 1

Solution

  • library(tidyverse)
    
    data <- tibble(
      Item1 = c("One", "One", "One", "One", "One", "One"),
      Item2 = c("Two", "Two", "Two", "Two", "Two", "Two"),
      Item3 = c("A", "A", "A", "B", "B", "C")
    )
    
    data %>% 
      mutate(ID = row_number(), .by = Item3))
    
      Item1 Item2 Item3     ID
      <chr> <chr> <chr> <int>
    1 One   Two   A         1
    2 One   Two   A         2
    3 One   Two   A         3
    4 One   Two   B         1
    5 One   Two   B         2
    6 One   Two   C         1
    

    Credit to Chamkrai for the .by = Item3 idea 😃