Search code examples
rbioinformaticsgenetics

Order dataframe by colnames


I have a dataframe like this :

           G2_ref G10_ref G12_ref G2_alt G10_alt G12_alt
20011953      3      6      0      5       1     5    
12677336      0      0      0      1       3     6  
20076754      0      3      0     12      16     8 
2089670       0      4      0      1      11     9
9456633       0      2      0      3      10     0 
468487        0      0      0      0       0     0

And I'm trying to sort the columns to have finally this column order :

G2_ref G2_alt G10_ref G10_alt G12_ref G12_alt

I tried : df[,order(colnames(df))]

But I had this order :

G10_alt G10_ref G12_alt G12_ref G2_alt G2_ref

If anyone had any idea it will be great.


Solution

  • One option would be to extract the numeric part and also the substring at the end and then do the order

    df[order(as.numeric(gsub("\\D+", "", names(df))), 
                factor(sub(".*_", "", names(df)), levels = c('ref', 'alt')))]
    #          G2_ref G2_alt G10_ref G10_alt G12_ref G12_alt
    #20011953      3      5       6       1       0       5
    #12677336      0      1       0       3       0       6
    #20076754      0     12       3      16       0       8
    #2089670       0      1       4      11       0       9
    #9456633       0      3       2      10       0       0
    #468487        0      0       0       0       0       0
    

    data

    df <- structure(list(G2_ref = c(3L, 0L, 0L, 0L, 0L, 0L), G10_ref = c(6L, 
    0L, 3L, 4L, 2L, 0L), G12_ref = c(0L, 0L, 0L, 0L, 0L, 0L), G2_alt = c(5L, 
    1L, 12L, 1L, 3L, 0L), G10_alt = c(1L, 3L, 16L, 11L, 10L, 0L), 
        G12_alt = c(5L, 6L, 8L, 9L, 0L, 0L)), .Names = c("G2_ref", 
    "G10_ref", "G12_ref", "G2_alt", "G10_alt", "G12_alt"), 
       class = "data.frame", row.names = c("20011953", 
    "12677336", "20076754", "2089670", "9456633", "468487"))