numpy , applying function over list optimization

I have this two code that are doing the same but for different data structs

res = np.array([np.array([2.0, 4.0, 6.0]), np.array([8.0, 10.0, 12.0])], dtype=np.int)
%timeit np.sum(res, axis=1)
4.08 µs ± 728 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

list_obj_array = np.ndarray((2,), dtype=np.object)
list_obj_array[0] = [2.0, 4.0, 6.0]
list_obj_array[1] = [8.0, 10.0, 12.0]

v_func = np.vectorize(np.sum, otypes=[np.int])
%timeit v_func(list_obj_array)
20.6 µs ± 486 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

the second one is 5 times slower , is there a better way to optimize this?

@nb.jit()
def nb_np_sum(arry_list):
   return [np.sum(row) for row in arry_list]

%timeit nb_np_sum(list_obj_array)
30.8 µs ± 5.88 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

@nb.jit()
def nb_sum(arry_list):
    return [sum(row) for row in arry_list]

%timeit nb_sum(list_obj_array)
13.6 µs ± 669 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Best so far (thanks @hpaulj)

%timeit [sum(l) for l in list_obj_array]
850 ns ± 115 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

@nb.njit()
def nb_sum(arry_list):
    return [sum(row) for row in arry_list]


TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Untyped global name 'sum': cannot determine Numba type of <class 'builtin_function_or_method'>

File "<ipython-input-54-3bb48c5273bb>", line 3:
def nb_sum(arry_list):
        return [sum(row) for row in arry_list]

for longer array

list_obj_array = np.ndarray((n,), dtype=np.object)
for i in range(n):
    list_obj_array[i] = list(range(7))

the vectorized version come closer to the best option (list Comprehension)

%timeit [sum(l) for l in list_obj_array]
23.4 µs ± 4.19 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

 %timeit v_func(list_obj_array)
29.6 µs ± 4.91 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

numba still is slower

%timeit nb_sum(list_obj_array)
74.4 µs ± 6.11 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Solution

Since you used otypes you read enough of the vectorize docs to know that it is not a performance tool.

In [430]: timeit v_func(list_obj_array)
38.3 µs ± 894 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

A list comprehension is faster:

In [431]: timeit [sum(l) for l in list_obj_array]
2.08 µs ± 62.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Even better if you start with a list of list instead on of an object dtype array:

In [432]: alist = list_obj_array.tolist()
In [433]: timeit [sum(l) for l in alist]
542 ns ± 11.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

edit

np.frompyfunc is faster than np.vectorize, especially when working with object dtype arrays:

In [459]: np.frompyfunc(sum,1,1)(list_obj_array)
Out[459]: array([12.0, 30.0], dtype=object)
In [460]: timeit np.frompyfunc(sum,1,1)(list_obj_array)
2.22 µs ± 16.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

As I've seen elsewhere frompyfunc is competitive with the list comprehension.

Interestingly, using np.sum instead of sum slows it down. I think that's because np.sum applied to lists has the overhead of converting the lists to arrays. sum applied to lists of numbers is pretty good, using python's own compiled code.

In [461]: timeit np.frompyfunc(np.sum,1,1)(list_obj_array)
30.3 µs ± 165 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

So let's try sum in your vectorize:

In [462]: v_func = np.vectorize(sum, otypes=[int])
In [463]: timeit v_func(list_obj_array)
8.7 µs ± 331 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Much better.