Search code examples
pythonlisttime-complexitybig-oflatten

Python big_o seems to return completely incorrect results - what am I doing wrong?


I am comparing runtimes of different ways of flattening a list of lists using the big_o module, and for following methods my function does not return the expected results, namely:

  1. This one:

    def itertools_chain_from_iterable(arr): 
        return list(chain.from_iterable(arr))

returns "constant", which can't possibly be true.

  1. This one:
    def merge_extend(self,arr):
        output = []
        for l in arr:
            output.extend(l)
        return output

returns "cubic" (shouldn't it be quadratic at most?), while...

  1. ..this one
    def merge_w_sum(self,arr): 
        return sum(arr,[])

returns "linear" (I'm quite sure it should be quadratic, see proof here.

  1. Furthermore, the list comprehension one
    def merge_bucket(self,bucket):
        return [number for n in bucket for number in n]

returns "polynomial", which seems terrifying (would expect linear here as well)

Code used to calculate the complexities:

print('<function name>:', big_o.big_o(<function name>, 
       lambda n:[big_o.datagen.integers(9900,1,9999999) for n in range(50)],
       n_measures=20)[0])

Output:

complexity of itertools_chain_from_iterable: Constant: time = 0.0013 (sec)
complexity of merge_w_sum: Linear: time = 0.46 + 6.2E-07*n (sec)
complexity of merge_extend: Cubic: time = 0.048 + -2.3E-18*n^3 (sec)
complexity of merge_bucket: Polynomial: time = 0.2 * x^-0.019 (sec)

What is it that I'm doing (or understanding) wrong? Many thanks in advance for useful tips!


Solution

  • Your lambda ignores its argument n and instead always produce the same constant-size input. Produce input of size n instead.

    (A note: originally the question had 8 functions and 7 of them were judged "constant time". It was edited to a larger constant and then got other judgements. I guess your computer's speed is somewhat unstable, as the constant input should still lead to constant runtimes and thus "constant time" judgements. In any case, the input-generating function should be fixed like I said.)

    Given that it's two-dimensional, you could for example produce n lists of size 1, one list of size n, or sqrt(n) lists of size sqrt(n). Presumably n lists of size 1 is what you want if your goal is to demonstrate that sum(arr, []) is bad. For example:

    lambda n: [[i] for i in range(n)]
    

    A full program:

    import big_o
    from itertools import chain
    
    def chained(arr):
        return list(chain.from_iterable(arr))
    
    def merge_extend(arr):
        output = []
        for l in arr:
            output.extend(l)
        return output
    
    def merge_w_sum(arr): 
        return sum(arr,[])
    
    def merge_bucket(bucket):
        return [number for n in bucket for number in n]
    
    funcs = [
        (chained, 10**5),
        (merge_extend, 10**5),
        (merge_w_sum, 10**4),
        (merge_bucket, 10**5),
    ]
    
    for _ in range(3):
        for func, max_n in funcs:
            complexity = big_o.big_o(
                func,
                lambda n: [[0]] * n,
                max_n=max_n,
                n_timings=10,
            )[0]
            print(
                f'{func.__name__:13}',
                complexity
            )
        print()
    

    Sample results:

    chained       Linear: time = 8.2E-05 + 5.8E-08*n (sec)
    merge_extend  Linear: time = -4.2E-06 + 8.4E-08*n (sec)
    merge_w_sum   Quadratic: time = 0.00013 + 2.4E-09*n^2 (sec)
    merge_bucket  Linear: time = 0.00046 + 8E-08*n (sec)
    
    chained       Logarithmic: time = -0.0075 + 0.0014*log(n) (sec)
    merge_extend  Linear: time = -2E-05 + 8.5E-08*n (sec)
    merge_w_sum   Quadratic: time = 0.00011 + 2.4E-09*n^2 (sec)
    merge_bucket  Linear: time = -4.2E-05 + 2.6E-07*n (sec)
    
    chained       Linear: time = -1.8E-05 + 1.6E-07*n (sec)
    merge_extend  Logarithmic: time = -0.01 + 0.0019*log(n) (sec)
    merge_w_sum   Quadratic: time = 8.3E-05 + 2.4E-09*n^2 (sec)
    merge_bucket  Linear: time = 7.1E-05 + 8.3E-08*n (sec)
    

    You can see it gets it right most of the time, but still sometimes thinks logarithmic instead of linear. On a more stable system it might work better. Larger max_n values should help, too, but I tried and then big_o crashed with some known internal error.

    Also note that I used different max_n values for the different functions. It's no good to use the same for all. If you use the same for all, then if it's large, a quadratic time solution takes unbearably long, and if it's small, a linear time solution takes so little time that big_o has trouble differentiating it from logarithmic or linearithmic. There doesn't seem to be a medium value that's good for all. Ideally, big_o would automatically adjust max_n appropriately, but I don't think it supports that.