I am sorting millions of structs organzied in an array with the qsort-function of the standard c library. I tried to optimize the performance by creating an array of pointers of the struct with the same length. In contrast to my expectations the execution time of the second variant is slower:
qsort an array of structs: 199s qsort an array of pointers of structs: 204
I expected that the time for swapping pointer blocks in the memory would be faster than moving structs (size 576). May I have any performance leaks or is this a known behaviour?
There are other issues here.
By creating the array of pointers you are fragmenting the memory. Algorithms in the standard libraries are designed to optimise the sorting of contiguous arrays, so by doing this you are probably missing the cache far more often than if you just had a bigger array.
Quicksort in particular is quite good for locality of reference, as you halve the sample size and so eventually you are sorting subsets of the original array in chunks that can completely fit into your cache.
As a general rule, cache misses are an order of magnitude slower than hits. As a result this time delay could be significant enough to make up for the speed up you get by not copying all the bytes.