I want to make a huge table of data and there are data coming from different places, but some of the names are the same and it's not possible to decide where it came from.
I have a solution in my head, but I don't know if its possible to achieve.
Here is a part of my data:
name id sym
ENSG00000135821 2752 GLUL
ENSG00000135821 2752 GLUL
ENSG00000135821 2752 GLUL
ENSG00000135821 2752 GLUL
As you can see, I cannot decide where it came from. My idea is to modify the names of the name in the separated dataframes before merging them and getting a merged df like this:
name id sym
ENSG00000135821_sample1 2752 GLUL
ENSG00000135821_sample2 2752 GLUL
ENSG00000135821_sample3 2752 GLUL
ENSG00000135821_sample4 2752 GLUL
Is it possible to add modification to all the names in a df column with keeping the original name?
For a separate df I would like to get:
name id sym
ENSG00000135821_sample1 2752 GLUL
ENSG00000182667_sample1 50863 NTM
ENSG00000155495_sample1 9947 MAGEC1
ENSG00000198959_sample1 7052 TGM2
Thank you!
A dplyr
solution. Group by id
and sym
and use seq_along
to get the consecutive numbers.
df1 <- 'name id sym
ENSG00000135821 2752 GLUL
ENSG00000135821 2752 GLUL
ENSG00000135821 2752 GLUL
ENSG00000135821 2752 GLUL'
df1 <- read.table(textConnection(df1), header = TRUE)
df2 <-"name id sym
ENSG00000135821 2752 GLUL
ENSG00000182667 50863 NTM
ENSG00000155495 9947 MAGEC1
ENSG00000198959 7052 TGM2"
df2 <- read.table(textConnection(df2), header = TRUE)
suppressPackageStartupMessages(
library(dplyr)
)
df1 %>%
group_by(id, sym) %>%
mutate(name = paste0(name, "_sample", seq_along(name))) %>%
ungroup()
#> # A tibble: 4 × 3
#> name id sym
#> <chr> <int> <chr>
#> 1 ENSG00000135821_sample1 2752 GLUL
#> 2 ENSG00000135821_sample2 2752 GLUL
#> 3 ENSG00000135821_sample3 2752 GLUL
#> 4 ENSG00000135821_sample4 2752 GLUL
Created on 2022-10-14 with reprex v2.0.2
This can be written as function and applied to any data set as long as the columns names are the same, name
, id
and sym
.
newname <- function(x) {
x %>%
group_by(id, sym) %>%
mutate(name = paste0(name, "_sample", seq_along(name))) %>%
ungroup()
}
newname(df1)
#> # A tibble: 4 × 3
#> name id sym
#> <chr> <int> <chr>
#> 1 ENSG00000135821_sample1 2752 GLUL
#> 2 ENSG00000135821_sample2 2752 GLUL
#> 3 ENSG00000135821_sample3 2752 GLUL
#> 4 ENSG00000135821_sample4 2752 GLUL
newname(df2)
#> # A tibble: 4 × 3
#> name id sym
#> <chr> <int> <chr>
#> 1 ENSG00000135821_sample1 2752 GLUL
#> 2 ENSG00000182667_sample1 50863 NTM
#> 3 ENSG00000155495_sample1 9947 MAGEC1
#> 4 ENSG00000198959_sample1 7052 TGM2
Created on 2022-10-14 with reprex v2.0.2