Search code examples
rdataframeseurat

New ID column depending on another column in R


I want to generate a new ID column in my df based on another column my df looks something like this

> TCR <- c("CAAETSGSRLTF;CASSQEGTGVYEQYF","CGSRLTF;CASSQEGTGVYEQYF","CAAETSGSRLTF;CASSQEGT", "CAAETSGSRLTF;CASSQEGTGVYEQYF")
> df <- as.data.frame(TCR)
> df
    cdr3
1 CAAETSGSRLTF;CASSQEGTGVYEQYF
2      CGSRLTF;CASSQEGTGVYEQYF
3 CAAETSGSRLTF;CASSQEGT
4 CAAETSGSRLTF;CASSQEGTGVYEQYF

I want to add a new column df$ID that looks into df$cdr3 and assigns a new character for each value, and if the value is repeated it uses the same value that was used before So it becomes something like this

>df 
    cdr3                           ID
1 CAAETSGSRLTF;CASSQEGTGVYEQYF     X1 
2      CGSRLTF;CASSQEGTGVYEQYF     X2
3 CAAETSGSRLTF;CASSQEGT            X3
4 CAAETSGSRLTF;CASSQEGTGVYEQYF     X1

Thanks a lot guys


Solution

  • We can use match in base R to match the unique values in 'cdr3', get the index and paste with X

    df$ID <- paste0("X", match(df$cdr3, unique(df$cdr3)))
    

    -output

    > df
                              cdr3 ID
    1 CAAETSGSRLTF;CASSQEGTGVYEQYF X1
    2      CGSRLTF;CASSQEGTGVYEQYF X2
    3        CAAETSGSRLTF;CASSQEGT X3
    4 CAAETSGSRLTF;CASSQEGTGVYEQYF X1