Search code examples
pythonbig-ored-black-tree

Strange results when plotting (Cormen) Red-black tree insert


I implemented in Red-Black trees in Python according to the pseudo code in Cormen's Introduction to Algorithms.

I wanted to see in my own eyes that my insert is really O(logn) so I plotted the time it takes to insert n=1, 10, 20, ..., 5000 nodes into the tree.

This is the result:

enter image description here

the x-axis is n and the y-axis is the time it took in milliseconds.

To me the graph looks more linear than logarithmic. What can explain that?


Solution

  • Ok, so the graph displays a measurement of the cost of inserting n elements into your tree, where the x axis is how many elements we've inserted, and the y axis is the total time.

    Let's call the function that totals the time it takes to insert n elements into the tree f(n).

    Then we can get a rough idea of what f might look like:

    f(1) < k*log(1)                 for some constant k.
    
    f(2) < k*log(1) + k*log(2)      for some constant k
    
    ...
    
    f(n) < k * [log(1) + log(2) + ... + log(n)]   for some constant k.
    

    Due to how logs work, we can collapse log(1) + ... + log(n):

    f(n) < k * [log(1*2*3*...*n)]     for some constant k
    
    f(n) < k * log(n!)                for some constant k
    

    We can take a look at Wikipedia to see a graph of what log(n!) looks like. Take a look at the graph in the article. Should look pretty familiar to you. :)

    That is, I think you've done this by accident:

    for n in (5000, 50000, 500000):
        startTime = ...
        ## .. make a fresh tree
        ## insert n elements into the tree
        stopTime = ...
        ## record the tuple (n, stopTime - startTime) for plotting
    

    and plotted total time to construct the tree of size n, rather than the individual cost of inserting one element into a tree of size n:

    for n in range(50000):
        startTime = ...
        ## insert an element into the tree
        stopTime = ...
        ## record the tuple (n, stopTime - startTime) for plotting
    

    Chris Taylor notes in the comments that if you plot f(n)/n, you'll see a log graph. That's because a fairly tight approximation to log(n!) is n*log(n) (see the Wikipedia page). So we can go back to our bound:

    f(n) < k * log(n!)                for some constant k
    

    and get:

    f(n) < k * n * log(n)             for some constant k
    

    And now it's should be easier to see that if you divide f(n) by n, your graph will be bounded above by the shape of a logarithm.