Search code examples
rdata-cleaning

Remove the version number in the data frame in R column


This is my original data.frame:

cell         counts        gene   
TGCTACC-1     10           ALKBH5
TACACGA-1     20           KDM5C
TCCTTGG-1     30           EZH2
TACGGTC-1     30           PRMT2

I want to remove the trailing numbers and "-" from the cell column. How can I do this?

My desired output likes this:

cell         counts        gene   
TGCTACC       10           ALKBH5
TACACGA       20           KDM5C
TCCTTGG       30           EZH2
TACGGTC       30           PRMT2


Solution

  • You can try:

    sub("-.*", "", df$cell)
    
    [1] "TGCTACC" "TACACGA" "TCCTTGG" "TACGGTC"