I am concerned with different styles of creating a sorted array from a Java PriorityQueue containing a few thousand elements. The Java 8 docs say
If you need ordered traversal, consider using Arrays.sort(pq.toArray()).
However, I do like the streaming API, so what I had at first was
Something[] elems = theHeap.stream().sorted(BY_CRITERION.reversed())
.toArray(Something[]::new);
(Where BY_CRITERION
is the custom comparator of the PriorityQueue, and I do want the reverse order of that.) Is there any disadvantage to using this idiom, as compared to the following:
Something[] elems = theHeap.toArray(new Something[0]);
Arrays.sort(elems, BY_CRITERION.reversed());
This latter code certainly appears to be following the API doc recommendation more directly, but apart from that, is it really more efficient in terms of memory, such as fewer temporary structures being allocated etc.?
I would think that the streaming solution would have to buffer the stream elements in a temporary structure (an array?) and then sort them, and finally copy the sorted elements to the array allocated in toArray()
.
While the imperative solution would buffer the heap elements in a newly allocated array and then sort them. So this might be one copy operation fewer. (And one array allocation. The discussion of Collection.toArray(new T[size])
vs. Collection.toArray(new T[0])
is tangentially related here. For example, see here for why on OpenJDK the latter is faster.)
And what about sorting efficiency? The docs of Arrays.sort() say
Temporary storage requirements vary from a small constant for nearly sorted input arrays to n/2 object references for randomly ordered input arrays
while the documentation of Stream.sorted()
is silent on this point. So at least in terms of being reliably documented, the imperative solution would seem to have an advantage.
But is there anything more to know?
Fundamentally, both variants do the same and since both are valid solutions within the library’s intended use cases, there is no reason why the implementations should prefer one over the other regarding choosing algorithms or adding optimizations.
In practice, this means that the most expensive operation, the sorting, ends up at the same implementation methods internally. The sorted(…)
operation of the Stream implementation buffers all elements into an intermediate array, followed by calling Arrays.sort(T[], int, int, Comparator<? super T>)
, which will delegate to the same method as the method Arrays.sort(T[], Comparator<? super T>)
you use in your first variant, the target being a sort method within the internal TimSort
class.
So everything that has been said about the time and space complexity for Arrays.sort
applies to the Stream.sort
as well. But there is a performance difference though. For the OpenJDK implementations up to Java 10, the Stream is not capable of fusing the sorted
with the subsequent toArray
step, to directly use the result array for the sorting step. So currently, the Stream variant bears a final copying step from the intermediate array used for sorting to the final array created by the function passed to toArray
. But future implementations might learn this trick, then, there will be no relevant performance different at all between the two solutions.