I have created a dataframe with a group column and an individual identifier which incorporates the group name and a number formatted to a standardised three digit code:
library(stringr)
group = rep(c("A", "B", "C"), each = 3)
df <- data.frame(group, indiv = paste(group, str_pad(1:9, pad = 0, width = 3 , "left"), sep = ""))
All well and good, but how would I go about resetting the individual identifier each time there is a new prefix, for this ideal result:
df2 <- data.frame(group, indiv = c("A001", "A002", "A003",
"B001", "B002", "B003",
"C001", "C002", "C003"))
We may group by 'group', use substr
to extract the first character from 'indiv' and use sprintf
to format the sequence (row_number()
)
library(dplyr)
df %>%
group_by(group) %>%
mutate(indiv = sprintf('%s%03d', substr(indiv, 1, 1), row_number())) %>%
ungroup
-output
# A tibble: 9 × 2
group indiv
<chr> <chr>
1 A A001
2 A A002
3 A A003
4 B B001
5 B B002
6 B B003
7 C C001
8 C C002
9 C C003
Or compactly with data.table
library(data.table)
setDT(df)[, indiv := sprintf('%s%03d', group, rowid(group))]
Or using base R
df$indiv <- with(df, sprintf('%s%03d', group,
ave(seq_along(group), group, FUN = seq_along)))