A list of non-zero-padded unsorted lines is given:
seq 100 | shuf
piping that into sort
would give:
1
10
100
11
12
13
...
because sort
sorts lexicographical by default. Thus, it has -n
option for numerical sort, which would yield the expected result. However, if strings are not totally numeric, this wouldn't work:
seq 100 | shuf | sed s/^/E/ | sort -n
Or for a more complex case:
paste -dS <(seq 100 | shuf | sed 's/^/E/' | sort -n) <(seq 100 | shuf)
---
E1S70
E10S75
E100S41
E11S53
...
with the expected output of lexicographical sorting for characters but numerical sorting for numbers:
E1S70
E10S75
E11S53
E100S41
Think of numbers as a single block, compared numerically with other numbers, but lexicographically with other characters.
What's an efficient way to sort non-zero-padded mixed strings?
You appear to be describing natural sort:
natural sort order (or natural sorting) is the ordering of strings in alphabetical order, except that multi-digit numbers are treated atomically, i.e., as if they were a single character
It is not clear how you wish to handle fractional numbers (could be delimited with various characters such as .
or ,
); or a mix of zero-padded and non-padded numbers.
For shell programming, GNU has extended sort
with a -V
/--version-sort
option, although this may not do what you want for a list such as:
B27Y23S1
E10S33
ES020.4F3
ES20.14F3
ES2014F3
YF29399G3G3G
Perl has Sort::Versions, Sort::Key::Natural, etc.
(cf. Perl sort numbers naturally)
Python has natsort, etc.
(cf. Is there a built in function for string natural sort?)
Javascript has natural-sortby, etc.
(cf. Natural sort of alphanumerical strings in JavaScript)
I expect other environments have also come up with their own solutions.