I have a file named a.csv. which contains
100008,3
10000,3
100010,5
100010,4
10001,6
100021,7
After running this command sort -k1 -d -t "," a.csv
The result is
10000,3
100008,3
100010,4
100010,5
10001,6
100021,7
Which is unexpected because 10001 should come first than 100010
Trying to understand why this happened from long time. but couldn't get any answers.
$ sort --version
sort (GNU coreutils) 8.13
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Mike Haertel and Paul Eggert.
Some of the other responses have assumed this is a numeric sort vs dictionary sort problem. It isn't, as even sorting alphabetically the output given in the question is incorrect.
To get the correct sorting, you need to change -k1
to -k1,1
:
$ sort -k1,1 -d -t "," a.csv
10000,3
100008,3
10001,6
100010,4
100010,5
100021,7
The -k
option takes two numbers, the start and end fields to sort (i.e. -ks,e
where s
is the start and e
is the end). By default, the end field is the end of the line. Hence, -k1
is the same as not giving the -k
option at all. To show this, compare:
$ printf "1,a,1\n2,aa,2\n" | sort -k2 -t,
1,a,1
2,aa,2
with:
$ printf "1~a~1\n2~aa~2\n" | sort -k2 -t~
2~aa~2
1~a~1
The first sorts a,1
before aa,2
, while the second sorts aa~2
before a~1
since, in ASCII, ,
< a
< ~
.
To get the desired behaviour, therefore, we need to sort only one field. In your case, that means using 1 as both the start and end field, so you specify -k1,1
. If you try the two examples above with -k2,2
instead of -k2
, you'll find you get the same (correct) ordering in both cases.
Many thanks to Eric and Assaf from the coreutils mailing list for pointing this out.