Search code examples
linuxsortingmergebig-omergesort

Usage of merge in linux sort utility


I have a question about sort utility in Linux. Why does it use merge? There are sorting algorithms like Radix or Counting which have O(n) complexity(in any, worst and better scenarios) then Merge which have O(n*Log(n)) complexity. Why those faster alghoritms not used in sort utility?

This riddles me, since modern machines can take that bigger RAM usage that those faster algorithms might require.


Solution

  • Example code in corutils sort.c to sort text files. One issue with radix or counting sort is that the sort involves text fields or entire lines in lines of text. To improve performance, the initial pass of sort defaults to using 16 threads to sort arrays of pointers to lines of text. Once it transitions to merging files, it switches to a k-way merge, with k defaulting to 16, using a min-heap for the merge. Defaulting k to 16 makes sense for traditional hard drives, but for fast storage, like NVMe SSD, a smaller value for k makes more sense.