Mapping a function on a multi-dimensional array

I have a large array of dimensions MxNxK. I want to loop through axis M and apply a function on each of the NxK array. I want to avoid a for loop.

To illustrate the point, I build a 10x5x4 array.

import numpy as np

data = []
for i in range(1,11):

    a =np.full(shape=(5,4),fill_value=i)
    data.append(a)

data = np.array(data)

Now, I loop through the first dimension and apply a simple sum/mean in each of the 5x4 array.

means = []
sums = []

for i in data:
    means.append(i.sum())
    sums.append(i.mean())

In reality , I have a 5000x200x20 array and I want to apply a function on each of the 200x20 array , but I want to avoid the loop. "apply_along_axis" doesn't seem to address the issue. Please help

Edit In response to @chrslg answer, adding some more information. In my case I want to apply a prob_dist (pdf) and a Nth percentile in each of the M subarrays in the MxNxK array. I noticed that , using numba teh first function call indeed takes too long (7s) but every successive execution merely takes 0.77s. Whereas the conventional loop through M subarray is taking 3.9s

Solution

In this case

# keeping the same mean/sum logic inversion as in your code
means = data.sum(axis=(1,2))
sums = data.means(axis=(1,2))

should work.

Generally speaking, there are ways to avoid for loop to apply a function on all data. apply along axis for example. Or vectorize

def mymean(subarr):
    return subarr.mean()

mymeanvec = np.vectorize(mymean, signature='(m,n)->()')

mymeanvec(data)

is one way.

But that is not advisable. Those functions just avoid the for loop, but not the M calls to a python function. That is why numpy's documentation clearly states that those function are not for performance.

The good way to really vectorize, is to think vectorized. Most numpy operation can work directly on arrays, and do the for loop themselves.

Another way, when this happens to be impossible, because what you are doing is too exotic to be written without for loops using combination of numpy operation, is to use numba

from numba import jit

@jit(nopython=True)
def myNumbaMean(data):
    # It is always better to have the correctly shaped data ready, and avoid `append`. Especially with numpy (pure python append is efficient)
    means = np.empty((len(data),))
    sums = np.empty((len(data),))
    for i in range(len(data)):
        # still the same inversion
        means[i] = data[i].sum()
        sums[i] = data[i].mean()
    return means, sums

Takes some time for the first call, because it compiles it. Then it is as fast as C code.

In fact, with numba, you could even not rely on .mean() .sum() if you wanted to do something more exotic. Even this, apparently naive, code, would be quite efficient

@jit(nopython=True)
def myNaiveMean(data):
    means = np.empty((len(data),))
    sums = np.zeros((len(data),))
    m,n,p=data.shape
    # this time without the inversion
    for i in range(m):
        for j in range(n):
            for k in range(p):
                sums[i] += data[i,j,k]
        means[i] = sums[i]/n/p
    return means,sums

Timings

Some timing considerations On your data shape (5000,200,20), timings are as follows:

Method	Timing
python (yours)	133 ms
Vectorization	119 ms
Numba	65 ms
Direct numpy (my first answer)	54 ms
Numba naive	35 ms

As you can see, with numba, this is often quite disorienting, what we are used to consider the worst solution in python often happens to be the best one. Because it just does the job without subcalls and unnecessary intermediary results (like with the numba version where we call np.mean and np.sum).

And, which is even annoying, that naive version even beats the pure numpy one (data.sum(axis=(1,2))). That being said, I would not advise using numba when you have a pure numpy version. Because, indeed, this time, it beats pure numpy; but that is because the pure numpy code does nothng smarter than my code (because there is no room to be smart here: it is just a sum, you can't avoid iterating all elements and summing them. The only thing that makes numpy faster compared to, for example your code, is that it is compiled C code vs interpreted python code. But agains numba, it is compiled code vs compiled code). But for many computation, the authors of numpy have implemented more efficient algorithm that the first one I would come up with. So rewriting everything in numba would be not only a waste of time, but more often than not, leads to function not even as fast as numpy.

So, generally speaking, I would always try to "think numpy", as I did in my first answer. Then, if that is not possible, write a numba code. Then, if that is not possible (for example because numba is not available), try to "map a function", but being aware that this spares only the time spend in the for loop itself (the iteration), not the content of the for loop, which, most of the time, is where cpu is spent.

Last remark: in this case, my timings are not that impressive. ×4 gain as most. This seems a lot already. But that is nothing compared to what we often see here when suppressing python for loops (where ×100 or ×1000 gain factor are quite common). This is because your naive code is not that slow. Because only one for loop, the outer one is made in python. The two implicit inner loops are inside sum and mean code. All together, you have 5000 + 5000×200 + 5000×200×20 iterations. And 5000×200 + 5000×200×20 are already vectorized in numpy. So what we saved here are just 5000 iterations and 5000 calls.