Search code examples
sortingmpi

MPI sort implementation


I need to sort a really huge file several, hundred of Gb. Luckily I have access to a Linux MPI cluster. Does somebody know a good but most importantly working sort program which can run in distributed environment using MPI. Actually I want to count unique lines in that file so if somebody knows a program that does exactly that even better. Otherwise I can figure out how to do it myself later.


Solution

  • Because there was no no answer provided I though I would just share my results.

    I downloaded nsort program from ordinal.com (2004 winner in sortbenchmark.org annual sorting algorithm competition). It sorts amazingly fast though not in a cluster manner. I don't remember what was it anymore but I got huge time improvement using nsort. I'm talking about tens of times more faster (maybe around ~50) than default linux sort.

    Two more things to notice.

    • It is limited to text files sorting in non-commercial distribution.
    • It has exactly the same interface as linux sort utility.