Search code examples
macossorting

How to interpret "s1", and "s2" in MacOs sort in --debug mode


I am trying to sort a file lets call acr_list with 3 fields, the last one being a date column. The first and last fields are fully populated however the second field has gaps e.g;

sha:12344                          2022-02-10
sha:12345     ['tag1','tag2']      2022-01-24
sha:12346                          2022-01-11
sha:12347     ['tag3,'tag-4']      2022-01-03

I am getting unexpected results when sorting by date using;

sort -b -k 3 --debug acr_list

The lines are being sorted so that the lines with the gaps are put to the top, and sorted by the first field, and the lines without gaps are being sorted as expected by date, e.g;

sha:12344                          2022-02-10
sha:12346                          2022-01-11
sha:12347     ['tag3,'tag-4']      2022-01-03
sha:12345     ['tag1','tag2']      2022-01-24

This is mainly to understand better what sort is actually doing, and to interpret the output of the --debug flag, particularly what do s1, s2 and cmp1=1 mean in the following output (I've replaced the actual digests and tag names);

Using collate rules of en_GB.UTF-8 locale
sort_method=mergesort
; k1=<2022-09-16>(10), k2=<2022-04-28>(10); s1=<sha256:12344  ['tag1']          2022-09-16>, s2=<sha256:12345  ['tag2']                2022-04-28>; cmp1=5
; k1=<2022-09-16>(10), k2=<>(0); s1=<sha256:12346  ['tag3']                2022-09-16>, s2=<sha256:12347                            2022-04-14>; cmp1=1

Is there anywhere this is documented? I've searched many man pages and guides/blogs to find it but it seems to be obscure.

I am thinking that k1 and k2 are the keys (fields), and the numbers in the brackets (10) are the number of characters in the field, or columns, but can't figure out the other parts.

Thanks!


Solution

  • In case it helps, I got to the bottom of it;

    Sort will default to sorting by the whole line if it is not able to sort by the given field '$3', and in those cases there is no $3, only $1 and $2.

    Therefore to sort by date if it is in the last column, you can reformat the list so the date is in column $1 and sort by column $1. This was more to understand how the sort utility works in the case of irregular numbers of fields in the rows.