python list time-complexity big-o flatten

Python big_o seems to return completely incorrect results - what am I doing wrong?

I am comparing runtimes of different ways of flattening a list of lists using the big_o module, and for following methods my function does not return the expected results, namely:

This one:


    def itertools_chain_from_iterable(arr): 
        return list(chain.from_iterable(arr))

returns "constant", which can't possibly be true.

This one:

    def merge_extend(self,arr):
        output = []
        for l in arr:
            output.extend(l)
        return output

returns "cubic" (shouldn't it be quadratic at most?), while...

..this one

    def merge_w_sum(self,arr): 
        return sum(arr,[])

returns "linear" (I'm quite sure it should be quadratic, see proof here.

Furthermore, the list comprehension one

    def merge_bucket(self,bucket):
        return [number for n in bucket for number in n]

returns "polynomial", which seems terrifying (would expect linear here as well)

Code used to calculate the complexities:

print('<function name>:', big_o.big_o(<function name>, 
       lambda n:[big_o.datagen.integers(9900,1,9999999) for n in range(50)],
       n_measures=20)[0])

Output:

complexity of itertools_chain_from_iterable: Constant: time = 0.0013 (sec)
complexity of merge_w_sum: Linear: time = 0.46 + 6.2E-07*n (sec)
complexity of merge_extend: Cubic: time = 0.048 + -2.3E-18*n^3 (sec)
complexity of merge_bucket: Polynomial: time = 0.2 * x^-0.019 (sec)

What is it that I'm doing (or understanding) wrong? Many thanks in advance for useful tips!

Solution

Your lambda ignores its argument n and instead always produce the same constant-size input. Produce input of size n instead.

(A note: originally the question had 8 functions and 7 of them were judged "constant time". It was edited to a larger constant and then got other judgements. I guess your computer's speed is somewhat unstable, as the constant input should still lead to constant runtimes and thus "constant time" judgements. In any case, the input-generating function should be fixed like I said.)

Given that it's two-dimensional, you could for example produce n lists of size 1, one list of size n, or sqrt(n) lists of size sqrt(n). Presumably n lists of size 1 is what you want if your goal is to demonstrate that sum(arr, []) is bad. For example:

lambda n: [[i] for i in range(n)]

A full program:

import big_o
from itertools import chain

def chained(arr):
    return list(chain.from_iterable(arr))

def merge_extend(arr):
    output = []
    for l in arr:
        output.extend(l)
    return output

def merge_w_sum(arr): 
    return sum(arr,[])

def merge_bucket(bucket):
    return [number for n in bucket for number in n]

funcs = [
    (chained, 10**5),
    (merge_extend, 10**5),
    (merge_w_sum, 10**4),
    (merge_bucket, 10**5),
]

for _ in range(3):
    for func, max_n in funcs:
        complexity = big_o.big_o(
            func,
            lambda n: [[0]] * n,
            max_n=max_n,
            n_timings=10,
        )[0]
        print(
            f'{func.__name__:13}',
            complexity
        )
    print()

Sample results:

chained       Linear: time = 8.2E-05 + 5.8E-08*n (sec)
merge_extend  Linear: time = -4.2E-06 + 8.4E-08*n (sec)
merge_w_sum   Quadratic: time = 0.00013 + 2.4E-09*n^2 (sec)
merge_bucket  Linear: time = 0.00046 + 8E-08*n (sec)

chained       Logarithmic: time = -0.0075 + 0.0014*log(n) (sec)
merge_extend  Linear: time = -2E-05 + 8.5E-08*n (sec)
merge_w_sum   Quadratic: time = 0.00011 + 2.4E-09*n^2 (sec)
merge_bucket  Linear: time = -4.2E-05 + 2.6E-07*n (sec)

chained       Linear: time = -1.8E-05 + 1.6E-07*n (sec)
merge_extend  Logarithmic: time = -0.01 + 0.0019*log(n) (sec)
merge_w_sum   Quadratic: time = 8.3E-05 + 2.4E-09*n^2 (sec)
merge_bucket  Linear: time = 7.1E-05 + 8.3E-08*n (sec)

You can see it gets it right most of the time, but still sometimes thinks logarithmic instead of linear. On a more stable system it might work better. Larger max_n values should help, too, but I tried and then big_o crashed with some known internal error.

Also note that I used different max_n values for the different functions. It's no good to use the same for all. If you use the same for all, then if it's large, a quadratic time solution takes unbearably long, and if it's small, a linear time solution takes so little time that big_o has trouble differentiating it from logarithmic or linearithmic. There doesn't seem to be a medium value that's good for all. Ideally, big_o would automatically adjust max_n appropriately, but I don't think it supports that.