Search code examples
rsortingvectornatural-sort

R: unexpected natural sorting by gtools mixedsort


I am just finding some unexpected behavior in gtools::mixedsort looking at how my outputs are supposedly naturally sorted.

I have an example like this:

aa=c("CD57","CD58","CD158","CD158b","CD158e","CD158e1","CD319","CD335")
gtools::mixedsort(aa)

My expected result would be:

[1] "CD57"    "CD58"    "CD158"   "CD158b"  "CD158e"  "CD158e1" "CD319"
[8] "CD335"

However I obtain this:

[1] "CD57"    "CD58"    "CD158"   "CD158b"  "CD158e"  "CD319"   "CD335"
[8] "CD158e1"

Is this correct? What is the reason?


Solution

  • CD158e1 is treated as as 1580 here, because:

    >>> 158e1
    1580.0
    >>> 
    

    158e1 is an Euler number with e, so it gives one extra 0, e2 would give 2 extra and so on...

    So that's why it get's parsed as the last one in the list.


    As mentioned in the documentation of mixedsort:

    These functions sort or order character strings containing embedded numbers so that the numbers are numerically sorted rather than sorted by character value. I.e. "Aspirin 50mg" will come before "Aspirin 100mg". In addition, case of character strings is ignored so that "a", will come before "B" and "C".