Search code examples
sortingunixcsh

unix sort is giving different results for different users


I wanted to sort a file name reportA with following contents

pat_int_parallel_all


/projects/test
-v ../../../../../../te
min_custom.v
-v ../../../../../../tes
-y ../../../../../../test_
-y ../../../../../../test_lib/test
../../../../../../tesla
/projects/checklist
../../../../../../test_lib/LIB
../../../../../../telib/av
../../../../../../telib/te
+libext+.v
+incdir+/projectsst_relea/ana

when i tried sort -u -r reportA >output . I got this result

-y ../../../../../../test_lib/test
-y ../../../../../../test_
-v ../../../../../../tes
-v ../../../../../../te 
../../../../../../test_lib/LIB
../../../../../../test 
../../../../../../telib/te
../../../../../../telib/av
/projects/test /projects/checklist 
pat_int_parallel_all min_custom.v
+libext+.v
+incdir+/projectsst_relea/ana

My locale output is en_US

LANG=en_US
LC_CTYPE="en_US"
LC_NUMERIC="en_US"
LC_TIME="en_US"
LC_COLLATE="en_US"
LC_MONETARY="en_US"
LC_MESSAGES="en_US"
LC_PAPER="en_US"
LC_NAME="en_US"
LC_ADDRESS="en_US"
LC_TELEPHONE="en_US"
LC_MEASUREMENT="en_US"
LC_IDENTIFICATION="en_US"
LC_ALL=

But for the other user with same sort command it resulted in a different output.

pat_int_parallel_all
min_custom.v
/projects/test
/projects/checklist
../../../../../../test_lib/LIB
../../../../../../tesla
../../../../../../telib/te
../../../../../../telib/av
-y ../../../../../../test_lib/test
-y ../../../../../../test_
-v ../../../../../../tes
-v ../../../../../../te
+libext+.v
+incdir+/projectsst_relea/ana

My friends locale output is C

LANG=C
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_PAPER="C"
LC_NAME="C"
LC_ADDRESS="C"
LC_TELEPHONE="C"
LC_MEASUREMENT="C"
LC_IDENTIFICATION="C"
LC_ALL=C 

I was wondering why a normal uinx sort command is giving two different results when my sort alias,SHELL version is same as other user. Even cshrc settings are same. Is it due to the special characters?

Can someone explain what's wrong here.


Solution

  • The ground reason of the different behavior of sort is the value of LC_COLLATE. The output of man 7 locale says:

    LC_COLLATE

    This category governs the collation rules used for sorting and regular expressions, including character equivalence classes and multicharacter collating elements. This locale category changes the behavior of the functions strcoll(3) and strxfrm(3), which are used to compare strings in the local alphabet. For example, the German sharp s is sorted as "ss".

    My (very quick) analysis of sort source code, is that it transforms lines of text to be sorted with strxfrm() to get a basis of comparison, so that byte strings that would otherwise considered to be equal are considered equal here even if their bytes differ (sic).

    Regarding the fact that you still get the same output is, as said by @Amadan, quite strange. Are you sure you have set the locale properly? Could you try LC_COLLATE="C" sort -ru your_file.