Search code examples
rstrsplit

strsplit function in R for data.table


I have a table where one of my Columns (mydata$Gene) has some ID's which are in the format:

ENSG00000000419.8
ENSG00000000460.12

I wish to understand how to use the strsplit function to remove the .xx part

so I want all my outputs to come out as

ENSG00000000419
ENSG00000000460

etc

so far I have attempted the following code:

strsplit(mydata$Gene, ".", fixed=TRUE)

but get the error:

Error in strsplit(mydata$Gene, ".", fixed = TRUE) : non-character argument

and also

strsplit(mydata$Gene, "\.", fixed=TRUE)

Error: '.' is an unrecognized escape in character string starting ""."

any suggestions?

thank you for your time.


Solution

  • This works, because your data looks like its a factor:

    > strsplit(as.character(mydata$Gene), ".", fixed=TRUE)
    [[1]]
    [1] "ENSG00000000419" "8"              
    
    [[2]]
    [1] "ENSG00000000460" "12"             
    

    but you might do better by doing a replacement substitute if all you want is the text before the dot:

    > sub("\\..*$","",mydata$Gene)
    [1] "ENSG00000000419" "ENSG00000000460"
    >