When using sort
on the command line, why does the sorted order depend on which field delimiter I use? As an example,
$ # The test file:
$ cat test.csv
2,az,a,2
3,a,az,3
1,az,az,1
4,a,a,4
$ # sort based on fields 2 and 3, comma separated. Gives correct order.
$ LC_ALL=C sort -t, -k2,3 test.csv
4,a,a,4
3,a,az,3
2,az,a,2
1,az,az,1
$ # replace , by ~ as field separator, then sort as before. Gives incorrect order.
$ tr "," "~" < test.csv | LC_ALL=C sort -t"~" -k2,3
2~az~a~2
1~az~az~1
4~a~a~4
3~a~az~3
The second case not only gets the ordering wrong, but is inconsistent between field 2 (where az
< a
) and field 3 (where a
< az
).
There is a mistake in -k2,3
. That means that sort
should sort starting at the 2nd field and ending at the 3rd field. That means that the delimiter between them is also part of what is to be sorted and therefore counts as character. That's why you encounter different sorts with different delimiters.
What you want is the following:
LC_ALL=C sort -t"," -k2,2 -k3,3 file
And:
tr "," "~" < file | LC_ALL=C sort -t"~" -k2,2 -k3,3
That means sort
should sort the 2nd field and is the 2nd field has dublicates sort the 3rd field.