Search code examples
rstringdata-analysis

How do I sort a vector with names containing many strings of numbers in R?


I have a list of names which I would like to sort by the R value in ascending order.

[1] "W2345_S-001-R2-20D.datavalue.csv" "W2346_S-001-R4-20D.datavalue.csv"

[3] "W2347_S-001-R1-20D.datavalue.csv" "W2348_S-001-R3-20D.datavalue.csv"

[5] "W2349_S-001-R5-20D.datavalue.csv"

However, mixedsort only gives the above (sorting by W values) but I would like to arrange them by R1, R2, R3, R4, R5, ignoring the other numbers contained in the names. Hence the output should be

[1] "W2347_S-001-R1-20D.datavalue.csv" "W2345_S-001-R2-20D.datavalue.csv"

[3] "W2348_S-001-R3-20D.datavalue.csv" "W2346_S-001-R4-20D.datavalue.csv"

[5] "W2349_S-001-R5-20D.datavalue.csv"


Solution

  • list_of_names <- c("W2345_S-001-R2-20D_790.datavalue.csv",
                       "W2346_S-001-R4-20D_792.datavalue.csv",
                       "W2347_S-001-R1-20D_789.datavalue.csv",
                       "W2348_S-001-R3-20D_791.datavalue.csv",
                       "W2349_S-001-R5-20D_793.datavalue.csv")
    
    library(stringr)
    
    names_order <- order(as.numeric(str_match(list_of_names, "-R\\s*(.*?)\\s*-")[,2]))
    
    list_of_names[names_order]
    
    [1] "W2347_S-001-R1-20D_789.datavalue.csv" "W2345_S-001-R2-20D_790.datavalue.csv"
    [3] "W2348_S-001-R3-20D_791.datavalue.csv" "W2346_S-001-R4-20D_792.datavalue.csv"
    [5] "W2349_S-001-R5-20D_793.datavalue.csv"