I have C1 in a df and would like to get an new column, C2, with an id, based on each unique value in C1.
But I would like to have a specific name for the C2 (Group) followed by a number, starting counting from 01 and not 1, as I will have up to 13 Groups and want to group them properly.
I would also like to keep the same name for the last unique value (Z), so that C2 looks like this:
C1 C2
<chr> <chr>
1 A Group01
2 A Group01
3 A Group01
4 A Group01
5 B Group02
6 B Group02
7 B Group02
8 B Group02
9 C Group03
10 C Group03
11 C Group03
12 C Group03
13 Z Z
14 Z Z
15 Z Z
16 Z Z
I have tried to get the id, e.g.
df <- transform(df,id=as.numeric(factor(C1)))
But I get this.
C1 C2 id
1 A Group01 1
2 A Group01 1
3 A Group01 1
4 A Group01 1
5 B Group02 2
6 B Group02 2
7 B Group02 2
8 B Group02 2
9 C Group03 3
10 C Group03 3
11 C Group03 3
12 C Group03 3
13 Z Z 4
14 Z Z 4
15 Z Z 4
16 Z Z 4
I guess I could create a new column with the "Group" argument, but I don't know how to get an id starting from 01.
You can use match
+ unique
to get a unique number for each C1
value, keep the value same as C1
for the last value in the group. Use sprintf
to get value as 01.
library(dplyr)
df <- df %>%
mutate(tmp = match(C1, unique(C1)),
C2 = replace(sprintf('Group%02d', tmp), C1 == 'Z', 'Z')) %>%
select(-tmp)
df
# C1 C2
#1 A Group01
#2 A Group01
#3 A Group01
#4 A Group01
#5 B Group02
#6 B Group02
#7 B Group02
#8 B Group02
#9 C Group03
#10 C Group03
#11 C Group03
#12 C Group03
#13 Z Z
#14 Z Z
#15 Z Z
#16 Z Z
data
df <- structure(list(C1 = c("A", "A", "A", "A", "B", "B", "B", "B",
"C", "C", "C", "C", "Z", "Z", "Z", "Z")), row.names = c(NA, -16L
), class = "data.frame")