Search code examples
rrowgsubsubstr

how to edit rownames in R using sub or gsub command


I have a gene expression file and its row names is like this: GTEX.1117F.3226.SM.5N9CT enter image description here I want to edit its rownames to be like this:

GTEX-1117F and so on.

I used these commands:

row.names(gene_exp_transpose) <- data
gsub(".","-",row.names(gene_exp_transpose)) #this just gives ----- to all the rownames data 
row.names(gene_exp) substr(data, 0,5) ## but for the last rows, it has 4 character instead of 5.

Solution

  • A base R solution. Data borrowed from TarJae's answer.

    In the first instruction, the regex is almost identical to TarJae's, with two differences:

    1. The first period to be matched is escaped;
    2. the end of string is made explicit.

    Then the only period is replaced by a dash "_".

    row.names(df) <- sub('^([^.]+\\.[^.]+).*$', '\\1', row.names(df))
    row.names(df) <- sub('\\.', '-', row.names(df))
    row.names(df)
    #> [1] "GTEX-1117F" "GTEX-111FC" "GTEX-1128S" "GTEX-117XS" "GTEX-1192X"
    

    Created on 2022-07-02 by the reprex package (v2.0.1)


    Edit

    onyambu's comment makes the above code a one-liner.

    sub('^([^.]+)\\.([^.]+).*', '\\1-\\2', rownames(df))
    #> [1] "GTEX-1117F" "GTEX-111FC" "GTEX-1128S" "GTEX-117XS" "GTEX-1192X"
    

    Created on 2022-07-02 by the reprex package (v2.0.1)