Why do my binary heap insertions behave this way in practice?

I implemented in C++ an array based binary heap and a pointer based binary heap. I run a small experiment where for varying input sizes n, I did n insertions. The elements are of type int32_t and each one of them is picked uniformly at random (with mersenne twister) from

{1,...,std::numeric_limits<int32_t>::max()}

So I run every experiment 10 times and took the average cpu time it took to finish an experiment.

To compute the cpu time I used these functions:

clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start);

and

clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &end);

Here are the running times

To me it seems like to insert n elements takes linear time instead of nlogn time. If I divide the running time by n, I get the following graph:

Both running times converge to a constant. So this confirms my assumption.

But, why? Shouldn't it be converging to the logarithmic function? Isn't each insertion O(logn)?

Solution

It is indeed true that the expected time to build a binary heap from random data by repeated insertion is O(n), although the worst-case time (when the input is sorted) is O(n log n). This interesting result has been known for some time, although it is not apparently widely-known, presumably because of the popularity of the well-known guaranteed linear-time heapify algorithm due to R.W. Floyd.

Intuitively, one might expect the average insertion time for random elements to be O(1), based on the assumption that a randomly-built heap approximates a complete binary tree. The insertion algorithm consists of placing an element at the end of the heap and then advancing it by repeatedly swapping with its parent until the heap constraint is satisfied.

If the heap were a complete binary tree, the average insertion time would indeed by O(1), since at each point in the chain of swaps the probability that another swap will be necessary would be 0.5. Thus, in half of the cases no swap is necessary; a quarter of the time, one swap is needed, an eighth of the time, two swaps are needed; and so on. Hence, the expected number of swaps is 0 + 0.5 + 0.25 + ... == 1.

Since the heap is just an approximation of a complete binary tree, the above analysis is not sufficient. It is impossible to maintain a binary tree withou rebalancing, which has a nontrivial cost. But you can demonstrate that the heap is sufficiently similar to a binary tree that the expected insertion time is still O(1). The proof is non-trivial; one analysis available on-line is "Average Case Analysis of Heap Building by Repeated Insertion" (1991) by Ryan Hayward and Colin McDiarmid, which is available from the second author's online publication list.

While Floyd's heapify algorithm has better worst-case performance and a tighter inner loop, it is possible that the repeated-insertion algorithm is actually faster (on average) for large heaps because of cache effects. See, for example, the 1999 paper "Performance engineering case study: heap construction" by Jesper Bojesen, Jyrki Katajainen, and Maz Spork.

Note:

When performing experiments like this using random data, it is important to avoid counting the cost of generating the random numbers. For relatively fast algorithms like heap insertion, it is quite possible that the cost of calling the PRNG is significant compared to the cost of the algorithm, with the consequence that the observed results are biased by the linear cost of generating random numbers.

To avoid this effect, you should pregenerate the random array and then measure the cost of turning it into a heap.

As has often been observed, O(log n) is O(1) for all practical values of n; if what you have is c₁O(1) + c₂O(log n) where c₁ is much larger than c₂, the result will look a lot like O(1).