Search code examples
rdataframerename

Modify names in df column


I want to make a huge table of data and there are data coming from different places, but some of the names are the same and it's not possible to decide where it came from.

I have a solution in my head, but I don't know if its possible to achieve.

Here is a part of my data:

name            id      sym
ENSG00000135821 2752    GLUL
ENSG00000135821 2752    GLUL
ENSG00000135821 2752    GLUL
ENSG00000135821 2752    GLUL

As you can see, I cannot decide where it came from. My idea is to modify the names of the name in the separated dataframes before merging them and getting a merged df like this:

name                    id      sym
ENSG00000135821_sample1 2752    GLUL
ENSG00000135821_sample2 2752    GLUL
ENSG00000135821_sample3 2752    GLUL
ENSG00000135821_sample4 2752    GLUL

Is it possible to add modification to all the names in a df column with keeping the original name?

For a separate df I would like to get:

name                    id      sym
ENSG00000135821_sample1 2752    GLUL
ENSG00000182667_sample1 50863   NTM
ENSG00000155495_sample1 9947    MAGEC1
ENSG00000198959_sample1 7052    TGM2

Thank you!


Solution

  • A dplyr solution. Group by id and sym and use seq_along to get the consecutive numbers.

    df1 <- 'name            id      sym
    ENSG00000135821 2752    GLUL
    ENSG00000135821 2752    GLUL
    ENSG00000135821 2752    GLUL
    ENSG00000135821 2752    GLUL'
    df1 <- read.table(textConnection(df1), header = TRUE)
    
    df2 <-"name                    id      sym
    ENSG00000135821 2752    GLUL
    ENSG00000182667 50863   NTM
    ENSG00000155495 9947    MAGEC1
    ENSG00000198959 7052    TGM2"
    df2 <- read.table(textConnection(df2), header = TRUE)
    
    suppressPackageStartupMessages(
      library(dplyr)
    )
    
    df1 %>%
      group_by(id, sym) %>%
      mutate(name = paste0(name, "_sample", seq_along(name))) %>%
      ungroup()
    #> # A tibble: 4 × 3
    #>   name                       id sym  
    #>   <chr>                   <int> <chr>
    #> 1 ENSG00000135821_sample1  2752 GLUL 
    #> 2 ENSG00000135821_sample2  2752 GLUL 
    #> 3 ENSG00000135821_sample3  2752 GLUL 
    #> 4 ENSG00000135821_sample4  2752 GLUL
    

    Created on 2022-10-14 with reprex v2.0.2

    This can be written as function and applied to any data set as long as the columns names are the same, name, id and sym.

    newname <- function(x) {
      x %>%
        group_by(id, sym) %>%
        mutate(name = paste0(name, "_sample", seq_along(name))) %>%
        ungroup()
    }
    
    newname(df1)
    #> # A tibble: 4 × 3
    #>   name                       id sym  
    #>   <chr>                   <int> <chr>
    #> 1 ENSG00000135821_sample1  2752 GLUL 
    #> 2 ENSG00000135821_sample2  2752 GLUL 
    #> 3 ENSG00000135821_sample3  2752 GLUL 
    #> 4 ENSG00000135821_sample4  2752 GLUL
    
    newname(df2)
    #> # A tibble: 4 × 3
    #>   name                       id sym   
    #>   <chr>                   <int> <chr> 
    #> 1 ENSG00000135821_sample1  2752 GLUL  
    #> 2 ENSG00000182667_sample1 50863 NTM   
    #> 3 ENSG00000155495_sample1  9947 MAGEC1
    #> 4 ENSG00000198959_sample1  7052 TGM2
    

    Created on 2022-10-14 with reprex v2.0.2