Search code examples
rsortingalphanumeric

ordering alpha numeric variable in r


I would like to order a data frame based on an alphanumeric variable. Here how my dataset looks like:

sample.data <- data.frame(Grade=c(4,4,4,4,3,3,3,3,3,3,3,3),
                          ItemID = c(15,15,15,15,17,17,17,17,16,16,16,16),
                          common.names = c("15_AS_SA1_Correct","15_AS_SA10_Correct","15_AS_SA2_Correct","15_AS_SA3_Correct",
                                            "17_AS_2_B2","17_AS_2_B1","17_AS_5_C1","17_AS_4_D1",
                                           "16_AS_SA1_Negative","16_AS_SA11_Prediction","16_AS_SA12_UnitMeaning","16_AS_SA3_Complete"))

> sample.data
   Grade ItemID           common.names
1      4     15      15_AS_SA1_Correct
2      4     15     15_AS_SA10_Correct
3      4     15      15_AS_SA2_Correct
4      4     15      15_AS_SA3_Correct
5      3     17             17_AS_2_B2
6      3     17             17_AS_2_B1
7      3     17             17_AS_5_C1
8      3     17             17_AS_4_D1
9      3     16     16_AS_SA1_Negative
10     3     16  16_AS_SA11_Prediction
11     3     16 16_AS_SA12_UnitMeaning
12     3     16     16_AS_SA3_Complete

I need to order by Grade and ItemID, then by common.names variable that contains alphanumeric.

I used this:

sample.data.ordered <- sample.data %>%
  arrange(Grade, ItemID,common.names)

but it did not work for the whole set.

My desired output is:

> sample.data.ordered
   Grade ItemID           common.names
1      3     16     16_AS_SA1_Negative
2      3     16     16_AS_SA3_Complete
3      3     16  16_AS_SA11_Prediction
4      3     16 16_AS_SA12_UnitMeaning
5      3     17             17_AS_2_B1
6      3     17             17_AS_2_B2
7      3     17             17_AS_4_D1
8      3     17             17_AS_5_C1
9      4     15      15_AS_SA1_Correct
10     4     15      15_AS_SA2_Correct
11     4     15      15_AS_SA3_Correct
12     4     15     15_AS_SA10_Correct

Any thoughts? Thanks!


Solution

  • A base R solution using order as well as a more complex procedure for common.names involving gsub, regular expression and multiple backreference to match the numbers in the strings by which the column can be ordered:

    sample.data[order(sample.data$Grade, 
                  sample.data$ItemID, 
                  as.numeric(gsub(".*(SA|AS_)(\\d+)_(\\w)?(\\d)?.*", "\\2\\4", sample.data$common.names))),]
       Grade ItemID           common.names
    9      3     16     16_AS_SA1_Negative
    12     3     16     16_AS_SA3_Complete
    10     3     16  16_AS_SA11_Prediction
    11     3     16 16_AS_SA12_UnitMeaning
    6      3     17             17_AS_2_B1
    5      3     17             17_AS_2_B2
    8      3     17             17_AS_4_D1
    7      3     17             17_AS_5_C1
    1      4     15      15_AS_SA1_Correct
    3      4     15      15_AS_SA2_Correct
    4      4     15      15_AS_SA3_Correct
    2      4     15     15_AS_SA10_Correct