Strange results when plotting (Cormen) Red-black tree insert

I implemented in Red-Black trees in Python according to the pseudo code in Cormen's Introduction to Algorithms.

I wanted to see in my own eyes that my insert is really O(logn) so I plotted the time it takes to insert n=1, 10, 20, ..., 5000 nodes into the tree.

This is the result:

enter image description here

the x-axis is n and the y-axis is the time it took in milliseconds.

To me the graph looks more linear than logarithmic. What can explain that?

Solution

Ok, so the graph displays a measurement of the cost of inserting n elements into your tree, where the x axis is how many elements we've inserted, and the y axis is the total time.

Let's call the function that totals the time it takes to insert n elements into the tree f(n).

Then we can get a rough idea of what f might look like:

f(1) < k*log(1)                 for some constant k.

f(2) < k*log(1) + k*log(2)      for some constant k

...

f(n) < k * [log(1) + log(2) + ... + log(n)]   for some constant k.

Due to how logs work, we can collapse log(1) + ... + log(n):

f(n) < k * [log(1*2*3*...*n)]     for some constant k

f(n) < k * log(n!)                for some constant k

We can take a look at Wikipedia to see a graph of what log(n!) looks like. Take a look at the graph in the article. Should look pretty familiar to you. :)

That is, I think you've done this by accident:

for n in (5000, 50000, 500000):
    startTime = ...
    ## .. make a fresh tree
    ## insert n elements into the tree
    stopTime = ...
    ## record the tuple (n, stopTime - startTime) for plotting

and plotted total time to construct the tree of size n, rather than the individual cost of inserting one element into a tree of size n:

for n in range(50000):
    startTime = ...
    ## insert an element into the tree
    stopTime = ...
    ## record the tuple (n, stopTime - startTime) for plotting

Chris Taylor notes in the comments that if you plot f(n)/n, you'll see a log graph. That's because a fairly tight approximation to log(n!) is n*log(n) (see the Wikipedia page). So we can go back to our bound:

f(n) < k * log(n!)                for some constant k

and get:

f(n) < k * n * log(n)             for some constant k

And now it's should be easier to see that if you divide f(n) by n, your graph will be bounded above by the shape of a logarithm.