I am comparing runtimes of different ways of flattening a list of lists using the big_o module, and for following methods my function does not return the expected results, namely:
def itertools_chain_from_iterable(arr):
return list(chain.from_iterable(arr))
returns "constant", which can't possibly be true.
def merge_extend(self,arr):
output = []
for l in arr:
output.extend(l)
return output
returns "cubic" (shouldn't it be quadratic at most?), while...
def merge_w_sum(self,arr):
return sum(arr,[])
returns "linear" (I'm quite sure it should be quadratic, see proof here.
def merge_bucket(self,bucket):
return [number for n in bucket for number in n]
returns "polynomial", which seems terrifying (would expect linear here as well)
Code used to calculate the complexities:
print('<function name>:', big_o.big_o(<function name>,
lambda n:[big_o.datagen.integers(9900,1,9999999) for n in range(50)],
n_measures=20)[0])
Output:
complexity of itertools_chain_from_iterable: Constant: time = 0.0013 (sec)
complexity of merge_w_sum: Linear: time = 0.46 + 6.2E-07*n (sec)
complexity of merge_extend: Cubic: time = 0.048 + -2.3E-18*n^3 (sec)
complexity of merge_bucket: Polynomial: time = 0.2 * x^-0.019 (sec)
What is it that I'm doing (or understanding) wrong? Many thanks in advance for useful tips!
Your lambda ignores its argument n
and instead always produce the same constant-size input. Produce input of size n
instead.
(A note: originally the question had 8 functions and 7 of them were judged "constant time". It was edited to a larger constant and then got other judgements. I guess your computer's speed is somewhat unstable, as the constant input should still lead to constant runtimes and thus "constant time" judgements. In any case, the input-generating function should be fixed like I said.)
Given that it's two-dimensional, you could for example produce n lists of size 1, one list of size n, or sqrt(n) lists of size sqrt(n). Presumably n lists of size 1 is what you want if your goal is to demonstrate that sum(arr, [])
is bad. For example:
lambda n: [[i] for i in range(n)]
A full program:
import big_o
from itertools import chain
def chained(arr):
return list(chain.from_iterable(arr))
def merge_extend(arr):
output = []
for l in arr:
output.extend(l)
return output
def merge_w_sum(arr):
return sum(arr,[])
def merge_bucket(bucket):
return [number for n in bucket for number in n]
funcs = [
(chained, 10**5),
(merge_extend, 10**5),
(merge_w_sum, 10**4),
(merge_bucket, 10**5),
]
for _ in range(3):
for func, max_n in funcs:
complexity = big_o.big_o(
func,
lambda n: [[0]] * n,
max_n=max_n,
n_timings=10,
)[0]
print(
f'{func.__name__:13}',
complexity
)
print()
Sample results:
chained Linear: time = 8.2E-05 + 5.8E-08*n (sec)
merge_extend Linear: time = -4.2E-06 + 8.4E-08*n (sec)
merge_w_sum Quadratic: time = 0.00013 + 2.4E-09*n^2 (sec)
merge_bucket Linear: time = 0.00046 + 8E-08*n (sec)
chained Logarithmic: time = -0.0075 + 0.0014*log(n) (sec)
merge_extend Linear: time = -2E-05 + 8.5E-08*n (sec)
merge_w_sum Quadratic: time = 0.00011 + 2.4E-09*n^2 (sec)
merge_bucket Linear: time = -4.2E-05 + 2.6E-07*n (sec)
chained Linear: time = -1.8E-05 + 1.6E-07*n (sec)
merge_extend Logarithmic: time = -0.01 + 0.0019*log(n) (sec)
merge_w_sum Quadratic: time = 8.3E-05 + 2.4E-09*n^2 (sec)
merge_bucket Linear: time = 7.1E-05 + 8.3E-08*n (sec)
You can see it gets it right most of the time, but still sometimes thinks logarithmic instead of linear. On a more stable system it might work better. Larger max_n
values should help, too, but I tried and then big_o
crashed with some known internal error.
Also note that I used different max_n
values for the different functions. It's no good to use the same for all. If you use the same for all, then if it's large, a quadratic time solution takes unbearably long, and if it's small, a linear time solution takes so little time that big_o
has trouble differentiating it from logarithmic or linearithmic. There doesn't seem to be a medium value that's good for all. Ideally, big_o
would automatically adjust max_n
appropriately, but I don't think it supports that.