Search code examples
algorithmsortingrandomcomparisonnsarray

Is sortedArrayUsingComparator a safe way to randomize a NSArray?


I was messing around with NSArray functions earlier, and I think I may have happened across the easiest way to randomize a NSArray:

NSArray *randomize(NSArray *arr)
{
    return [arr sortedArrayUsingComparator:^NSComparisonResult(id obj1, id obj2) {
        return arc4random_uniform(3) - 1; // one of -1, 0, or 1
    }];
}

Which, in theory, should randomize the NSArray thoroughly. However, after considerable thinking, I was wondering if this could possibly be unsafe, and theoretically turn into an infinite loop, depending on the sort algorithms used by NSArray.

I tested this on arrays on sizes of 10 - 100000, and I saw a linear performance difference (about N * (log10(N) + 2) comparisons per randomization), which isn't bad.

However, is there ever a situation that could arise where the NSArray could never theoretically sort itself, and cause an application crash? It seems to me that it shouldn't happen, but you never know.


Solution

  • I think this depends on the underlying sorting algorithm.

    Consider what would happen if the underlying sort is a bubble sort. This means that any time you compare a pair of elements, there is a 1/3 chance that you will swap them (if the comparison makes them appear out of order). Consequently, if you were to sort an array of n elements with this comparison function, the probability that the algorithm terminates at each step is equal to the probability that none of the comparisons evaluate to "out of order." Since each comparison says "out of order" with probability 1/3, this means that the probability that the algorithm terminates on each iteration is (2/3)n. This means that the expected number of iterations before the algorithm terminates is (3/2)n = 3n / 2n. If you try running this algorithm for a reasonably-sized array (say, n = 1000), then the expected number of iterations is going to be staggeringly huge; n = 1000 gives 1.233840597×10176 expected iterations! This algorithm will eventually terminate, but the expected runtime is so long that from a practical perspective it's effectively infinite.

    On the other hand, if you try using a different algorithm, such as selection sort, you are not guaranteed to get a uniform distribution. For example, consider the very first pass of the algorithm, which will find the element to put at position 1. Every element in the array should (if the distribution really is uniform) have probability 1/n of being placed first. But this is not the case. Note that the first element will be kept in the first position unless it is swapped with something. This occurs only if the comparison comes up +1 (or -1, depending on the internals) at any point during the first scan. The probability that all the comparisons come back with a different value is (2/3)n-1, which is not the same as 1/n. In fact, it's astronomically unlikely that the first element in the sequence will end up at the front once you're done sorting. Consequently, even though the algorithm will terminate, there is no guarantee that you get a uniformly random distribution.

    If you try using something like quicksort, heapsort, or mergesort, then the algorithm will eventually terminate, but I'm not sure whether or not it's guaranteed to be random. I'll think about whether this is uniformly random for a bit and then update my answer.

    Hope this helps!