Search code examples
runiqueassign

Assign a name for each unique value in another column


I have C1 in a df and would like to get an new column, C2, with an id, based on each unique value in C1.
But I would like to have a specific name for the C2 (Group) followed by a number, starting counting from 01 and not 1, as I will have up to 13 Groups and want to group them properly. I would also like to keep the same name for the last unique value (Z), so that C2 looks like this:

   C1    C2     
   <chr> <chr>  
 1 A     Group01
 2 A     Group01
 3 A     Group01
 4 A     Group01
 5 B     Group02
 6 B     Group02
 7 B     Group02
 8 B     Group02
 9 C     Group03
10 C     Group03
11 C     Group03
12 C     Group03
13 Z     Z      
14 Z     Z      
15 Z     Z      
16 Z     Z 

I have tried to get the id, e.g. df <- transform(df,id=as.numeric(factor(C1))) But I get this.

   C1      C2 id
1   A Group01  1
2   A Group01  1
3   A Group01  1
4   A Group01  1
5   B Group02  2
6   B Group02  2
7   B Group02  2
8   B Group02  2
9   C Group03  3
10  C Group03  3
11  C Group03  3
12  C Group03  3
13  Z       Z  4
14  Z       Z  4
15  Z       Z  4
16  Z       Z  4 

I guess I could create a new column with the "Group" argument, but I don't know how to get an id starting from 01.


Solution

  • You can use match + unique to get a unique number for each C1 value, keep the value same as C1 for the last value in the group. Use sprintf to get value as 01.

    library(dplyr)
    
    df <- df %>%
            mutate(tmp = match(C1, unique(C1)), 
                   C2 = replace(sprintf('Group%02d', tmp), C1 == 'Z', 'Z')) %>%
            select(-tmp)
    df
    
    #   C1      C2
    #1   A Group01
    #2   A Group01
    #3   A Group01
    #4   A Group01
    #5   B Group02
    #6   B Group02
    #7   B Group02
    #8   B Group02
    #9   C Group03
    #10  C Group03
    #11  C Group03
    #12  C Group03
    #13  Z       Z
    #14  Z       Z
    #15  Z       Z
    #16  Z       Z
    

    data

    df <- structure(list(C1 = c("A", "A", "A", "A", "B", "B", "B", "B", 
    "C", "C", "C", "C", "Z", "Z", "Z", "Z")), row.names = c(NA, -16L
    ), class = "data.frame")