I've been puzzled with this when I saw the following files listed by ls
in strange order:
Star Wars Episode II - Attack of the Clones (2002) BDRip.mkv
Star Wars Episode III - Revenge of the Sith (2005) BDRip.mkv
Star Wars Episode I - The Phantom Menace (1999) BDRip.mkv
Star Wars Episode IV - A New Hope (1977) BDRip.mkv
Star Wars Episode VI - Return of the Jedi (1983) BDRip.mkv
Star Wars Episode V - The Empire Strikes Back (1980) BDRip.mkv
From human perspective 'I' should go first, then 'II' and so on.
so I created file with the following content:
$ cat 1
Star Wars Episode II - Attack
Star Wars Episode III - Revenge
Star Wars Episode I - The
Star Wars Episode IV - A
Star Wars Episode VI - Return
Star Wars Episode V - The
if I sort it it gives me this:
$ sort 1
Star Wars Episode II - Attack
Star Wars Episode III - Revenge
Star Wars Episode I - The
Star Wars Episode IV - A
Star Wars Episode VI - Return
Star Wars Episode V - The
However, if I remove '-' and everything after it sorts correct:
$ cat 1
Star Wars Episode II
Star Wars Episode III
Star Wars Episode I
Star Wars Episode IV
Star Wars Episode VI
Star Wars Episode V
$ sort 1
Star Wars Episode I
Star Wars Episode II
Star Wars Episode III
Star Wars Episode IV
Star Wars Episode V
Star Wars Episode VI
So, as soon as I add any symbol after space it starts sorting unpredictable for me:
$ cat 1
Star Wars Episode II y
Star Wars Episode III x
Star Wars Episode I z
Star Wars Episode IV w
Star Wars Episode VI v
Star Wars Episode V u
$ sort 1
Star Wars Episode III x
Star Wars Episode II y
Star Wars Episode IV w
Star Wars Episode I z
Star Wars Episode VI v
Star Wars Episode V u
Any hint on this sort behaviour ?
Update: sort: using ‘en_CA.UTF-8’ sorting rules
update #2 as per comment below it is because of locale.
ls | LANG=C sort
Star Wars Episode I - The Phantom Menace (1999) BDRip.mkv
Star Wars Episode II - Attack of the Clones (2002) BDRip.mkv
Star Wars Episode III - Revenge of the Sith (2005) BDRip.mkv
Star Wars Episode IV - A New Hope (1977) BDRip.mkv
Star Wars Episode V - The Empire Strikes Back (1980) BDRip.mkv
Star Wars Episode VI - Return of the Jedi (1983) BDRip.mkv
Why then UTF8 locale makes it different ? I checked with ru_RU.UTF8 (incorrect sorting) and ru_RU.KOI8-R (proper sorting)
Update #3 It is about locale: http://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021
I think I found the proper explanation of this:
Gnu coreutils FAQ: Sort does not sort in normal order
Found it on: sort not sorting as expected (space and locale)