Search code examples
runiquesubstr

removing duplicates from a column R


I have a column with list of Ids with varying length where some of the Ids have version numbers.

rownames(x)

"ENSP00000424360.1-D4"
"ENSP00000424360.2-D4"
"ENSP00000424360.3-D4"
"ENSP00000437781-D59"
"XP_010974537.1"
"XP_010974538.1"
"XP_010974538.2"

I want these to be changed into:

"ENSP00000424360"
"ENSP00000424360.1"
"ENSP00000424360.2"
"ENSP00000437781"
"XP_010974537"
"XP_010974538"
"XP_010974538.1"

I can convert ENSxx or XPxx individually using

make.unique(substr(rownames(x),1,15))

or

make.unique(substr(rownames(dds),1,12)) 

How can I change the code to get desired result?


Solution

  • We remove the substring with sub and apply the make.unique

    make.unique(sub("-.*$", "", sub("\\..*", "", rownames(x))))
    #[1] "ENSP00000424360"   "ENSP00000424360.1" "ENSP00000424360.2"
    #[4] "ENSP00000437781"   "XP_010974537"      "XP_010974538"      "XP_010974538.1"   
    

    data

    x <- structure(list(v1 = 1:7), .Names = "v1", row.names = c("ENSP00000424360.1-D4", 
     "ENSP00000424360.2-D4", "ENSP00000424360.3-D4", "ENSP00000437781-D59", 
     "XP_010974537.1", "XP_010974538.1", "XP_010974538.2"), class = "data.frame")