algorithm sorting partitioning selection-sort quickselect

Efficient algorithm of partial sorting into N unsorted groups

I'm looking for an efficient algorithm to perform the following: given an array of N items, sort it in a way so that items come as M equal groups, where each group is unsorted, but groups are sorted between each other (all elements in one group are less than any element of the next group).

The easiest way is to just sort the whole array. But it's inefficient, especially if the number of groups is much smaller than the total number of items (e.g. sort a million items into 5 groups).

Currently I've settled on using the quickselect algorithm (specifically, it's Floyd-Rivest variation) that sorts an array into 2 unsorted groups, and then applying it M-1 times. An significant improvement may be to apply divide-and-conquer approach to quickselect, first sorting the array into two groups, then sorting each half, etc., until we have M groups. Kind of a generalization of unordered partial sorting problem.

But I have a gut feeling that there might be a simpler and more efficient approach to the problem. Is there anything I'm missing? Any ideas? I need this for improving bulk insertion performance in my RBush JavaScript library (an efficient R*-tree implementation).

Solution

Here is a restatement of the problem. You need to simultaneously find M-1 ranked elements, and use them to divide the array into M unsorted groups.

I would suggest starting with standard quickselect with random pivots chosen to be close to the median (do the random subsample trick to estimate that), for each case dividing the array into 2, and also dividing the list of ranked elements you need to find into 2 as well. Continue this until you get down to the level where you have a ranked element to find in your current group. Then switch to the Floyd-Rivest variation of quickselect to actually find that element and split the current group into 2. Then coming back out of the quickselect, you can easily piece together the M groups that you want.

This approach has an expected running time of O(N log(M)) with pretty good constants. I doubt that significantly better than that is doable.