Modify names in df column

I want to make a huge table of data and there are data coming from different places, but some of the names are the same and it's not possible to decide where it came from.

I have a solution in my head, but I don't know if its possible to achieve.

Here is a part of my data:

name            id      sym
ENSG00000135821 2752    GLUL
ENSG00000135821 2752    GLUL
ENSG00000135821 2752    GLUL
ENSG00000135821 2752    GLUL

As you can see, I cannot decide where it came from. My idea is to modify the names of the name in the separated dataframes before merging them and getting a merged df like this:

name                    id      sym
ENSG00000135821_sample1 2752    GLUL
ENSG00000135821_sample2 2752    GLUL
ENSG00000135821_sample3 2752    GLUL
ENSG00000135821_sample4 2752    GLUL

Is it possible to add modification to all the names in a df column with keeping the original name?

For a separate df I would like to get:

name                    id      sym
ENSG00000135821_sample1 2752    GLUL
ENSG00000182667_sample1 50863   NTM
ENSG00000155495_sample1 9947    MAGEC1
ENSG00000198959_sample1 7052    TGM2

Thank you!

Solution

A dplyr solution. Group by id and sym and use seq_along to get the consecutive numbers.

df1 <- 'name            id      sym
ENSG00000135821 2752    GLUL
ENSG00000135821 2752    GLUL
ENSG00000135821 2752    GLUL
ENSG00000135821 2752    GLUL'
df1 <- read.table(textConnection(df1), header = TRUE)

df2 <-"name                    id      sym
ENSG00000135821 2752    GLUL
ENSG00000182667 50863   NTM
ENSG00000155495 9947    MAGEC1
ENSG00000198959 7052    TGM2"
df2 <- read.table(textConnection(df2), header = TRUE)

suppressPackageStartupMessages(
  library(dplyr)
)

df1 %>%
  group_by(id, sym) %>%
  mutate(name = paste0(name, "_sample", seq_along(name))) %>%
  ungroup()
#> # A tibble: 4 × 3
#>   name                       id sym  
#>   <chr>                   <int> <chr>
#> 1 ENSG00000135821_sample1  2752 GLUL 
#> 2 ENSG00000135821_sample2  2752 GLUL 
#> 3 ENSG00000135821_sample3  2752 GLUL 
#> 4 ENSG00000135821_sample4  2752 GLUL

^{Created on 2022-10-14 with reprex v2.0.2}

This can be written as function and applied to any data set as long as the columns names are the same, name, id and sym.

newname <- function(x) {
  x %>%
    group_by(id, sym) %>%
    mutate(name = paste0(name, "_sample", seq_along(name))) %>%
    ungroup()
}

newname(df1)
#> # A tibble: 4 × 3
#>   name                       id sym  
#>   <chr>                   <int> <chr>
#> 1 ENSG00000135821_sample1  2752 GLUL 
#> 2 ENSG00000135821_sample2  2752 GLUL 
#> 3 ENSG00000135821_sample3  2752 GLUL 
#> 4 ENSG00000135821_sample4  2752 GLUL

newname(df2)
#> # A tibble: 4 × 3
#>   name                       id sym   
#>   <chr>                   <int> <chr> 
#> 1 ENSG00000135821_sample1  2752 GLUL  
#> 2 ENSG00000182667_sample1 50863 NTM   
#> 3 ENSG00000155495_sample1  9947 MAGEC1
#> 4 ENSG00000198959_sample1  7052 TGM2

^{Created on 2022-10-14 with reprex v2.0.2}