Search code examples
calgorithmsortingmemoryexternal-sorting

Is that possible to mmap a very big file and using qsort?


I have to sort a large amount of data that can not fit in memory, and one thing could do this I know is "external sort". But I am wondering is that possible to mmap this large data file, and use 'qsort' as it is a 'normal data array'? If that's feasible, what's the differences with 'external sort'?


Solution

  • If the file will fit in a contiguous mapping in your address space, you can do this. If it won't, you can't.

    As to the differences:

    • if the file just about fits, and then you add some more data, the mmap will fail. A normal external sort won't suddenly stop working because you have a little more data.
    • if you don't map it with MAP_PRIVATE, sorting will mutate the original file. A normal external sort won't (necessarily)
    • if you do map it with MAP_PRIVATE, you could crash at any time if the VM doesn't have room to duplicate the whole file. Again, a strictly external sort's memory requirements don't scale linearly with the data size.

    tl;dr

    It is possible, it may fail unpredictably and unrecoverably, you almost certainly shouldn't do it.