I have a dataset in which the same id appears in different groups:
df <- read.table(text='id group
1 A
2 A
2 A
1 B
1 B
2 B
2 C
2 C
1 C
2 D
1 D
1 D', header=TRUE)
I want to rename the duplicated values under column id
that are grouped by another column group
. The expected output is:
id group
1 A
2 A
2 A
1_2 B
1_2 B
2_2 B
2_3 C
2_3 C
1_3 C
2_4 D
1_4 D
1_4 D
How do I do that?
Here is a data.table
approach using rleid()
to generate a run-length id for each unique id
and group
combination. We can then just paste()
that number to the existing id
, where it is >1
.
library(data.table)
setDT(df)
df[, id_num := rleid(group), id][
,
id := fifelse(
id_num == 1,
as.character(id),
paste(id, id_num, sep = "_")
)
][, `:=`(id_num = NULL)]
print(df)
# id group
# <char> <char>
# 1: 1 A
# 2: 2 A
# 3: 2 A
# 4: 1_2 B
# 5: 1_2 B
# 6: 2_2 B
# 7: 2_3 C
# 8: 2_3 C
# 9: 1_3 C
# 10: 2_4 D
# 11: 1_4 D
# 12: 1_4 D