Search code examples
rdataframesortingcharacteranalysis

How to sort a data in a column based on specific sample name characters in R?


I have a data frame (in R) that needs to be sorted out by parts of the sample name to do analysis. For example, each sample is coded in this format in a column titled "Sample": 1.4.a.2021 1 is the week 4 is the device number a is the sampling period in the week and 2021 is the year. I want to sort by the "4" (numbers range from 1-12). Can I sort by a particular character in a string--if that is what this is?


Solution

  • It sounds like your data is something like this:

    set.seed(1)
    
    df <- data.frame(Sample = paste0("1.", sample(12), ".",sample(letters, 12), "2021"),
                     Data = runif(12))
    df
    #>        Sample      Data
    #> 1   1.9.u2021 0.3823880
    #> 2   1.4.j2021 0.8696908
    #> 3   1.7.v2021 0.3403490
    #> 4   1.1.n2021 0.4820801
    #> 5   1.2.y2021 0.5995658
    #> 6   1.5.g2021 0.4935413
    #> 7   1.3.i2021 0.1862176
    #> 8   1.8.o2021 0.8273733
    #> 9   1.6.e2021 0.6684667
    #> 10 1.11.t2021 0.7942399
    #> 11 1.12.q2021 0.1079436
    #> 12 1.10.w2021 0.7237109
    

    To sort it we can find the number portion of the string, convert to numeric and order the data frame by it:

    df[order(as.numeric(gsub("^.*\\.(\\d+)\\..*$", "\\1", df$Sample))),]
    #>        Sample      Data
    #> 4   1.1.n2021 0.4820801
    #> 5   1.2.y2021 0.5995658
    #> 7   1.3.i2021 0.1862176
    #> 2   1.4.j2021 0.8696908
    #> 6   1.5.g2021 0.4935413
    #> 9   1.6.e2021 0.6684667
    #> 3   1.7.v2021 0.3403490
    #> 8   1.8.o2021 0.8273733
    #> 1   1.9.u2021 0.3823880
    #> 12 1.10.w2021 0.7237109
    #> 10 1.11.t2021 0.7942399
    #> 11 1.12.q2021 0.1079436
    

    Created on 2022-03-20 by the reprex package (v2.0.1)