python flatten an array of array, why funtools is slower?

based on this question python flatten an array of array

I want a faster way than the double loop solution. so I write a functools based function,but it seems much slower.

orders2.shape
(9966, 1)

import time

t0 = time.time()
[x for i in orders2.values.tolist() for x in i[0].tolist()]
t1 = time.time()

t1-t0
0.009984493255615234
import time

t0 = time.time()
functools.reduce(lambda a,b: a+b[0].tolist() , orders2.values.tolist(), [])
t1 = time.time()

t1-t0
1.4101920127868652

My question is 1. how could this happen? 2. will the functools algorithm be faster than double loop when using the big data? 3.any other algorithms that fast than double loop?

Solution

In short, function calling and list re-allocation overheads apart, your algorithm with the nested loops is O(N) and the one using reduce is O(N²).

Even if the algorithms would not be different, the idea that calling a function has "0" cost comes from mathematics, were functions are nice theoretical constructs.

When running in a computer program, calling a function requires a context to be initialised - in case of Python, the creation of a Frame object, with local variables. As you have arguments being passed around, it implies in a tuple with the arguments being constructed prior to the function call, and being de-constructed in the function body (although these steps might be optimized by the implementation).

While in the 2-nested loops approach, all you have to do is to iterate an iterator in native code - although theoretically, per Python's specs, that would also imply calling a function (the object's __iter__ method), it is usually much faster in real implementations of native code iterators.

That however would no account for the difference you are seeing there. The main problem is that for each iteraction, when performing a + b[0].tolist() a new list "c" is created in memory, the values of "a" are copied there, then the values from b[0] are appended to it. And this new list + copy of the elements already flattened will take place in each step. In the list-comphrehension case, there is no redundant copy taking place - a new element is placed as they come up from the unrolling of the parent 2D structure, and Python is well optimised to pre-alocate space for lists that grow up when been built, in this form.