Search code examples
rdataframeprefixseq

How to reset a numerical sequence after a new suffix in a R vector


I have created a dataframe with a group column and an individual identifier which incorporates the group name and a number formatted to a standardised three digit code:

library(stringr)
group = rep(c("A", "B", "C"), each = 3)
df <- data.frame(group, indiv = paste(group, str_pad(1:9, pad = 0, width = 3 , "left"), sep = ""))

All well and good, but how would I go about resetting the individual identifier each time there is a new prefix, for this ideal result:

df2 <- data.frame(group, indiv = c("A001", "A002", "A003", 
                                   "B001", "B002", "B003", 
                                   "C001", "C002", "C003"))

Solution

  • We may group by 'group', use substr to extract the first character from 'indiv' and use sprintf to format the sequence (row_number())

    library(dplyr)
    df %>% 
      group_by(group) %>% 
      mutate(indiv = sprintf('%s%03d', substr(indiv, 1, 1), row_number())) %>%
      ungroup
    

    -output

    # A tibble: 9 × 2
      group indiv
      <chr> <chr>
    1 A     A001 
    2 A     A002 
    3 A     A003 
    4 B     B001 
    5 B     B002 
    6 B     B003 
    7 C     C001 
    8 C     C002 
    9 C     C003 
    

    Or compactly with data.table

    library(data.table)
    setDT(df)[, indiv := sprintf('%s%03d', group, rowid(group))]
    

    Or using base R

    df$indiv <-  with(df, sprintf('%s%03d', group, 
           ave(seq_along(group), group, FUN = seq_along)))