python function numpy multidimensional-array vectorization

Apply a function to the 0-dimension of an ndarray

Problem

I have an ndarray, defined by arr that is an n-dimensional cube with length m in each dimension.
I want to act a function, func, by slicing along the dimension n=0 and taking each n-1-dim slice as an input to the function.

This seems to work for map() but I can't find a numpy variant that is appropriate. np.vectorise seems to split the n-1-tensor into individual scalar entries. Neither apply_along_axis or apply_over_axes seem appropriate either.

My problem is such that I need to pass arbitrary functions as inputs so I do not see a solution with einsum being feasible either.

Question

Do you know the best numpy alternative to using np.asarray(map(func, arr))?

Example

I define an example array, arr as a 4-dim cube (or 4-tensor) by:

m, n = 3, 4 
arr = np.arange(m**n).reshape((m,)*n)

I define an example function f,

def f(x):
    """makes it obvious how the np.ndarray is being passed into the function"""
    try: # perform an op using x[0,0,0] which is expected to exist
        i = x[0,0,0]
    except:
        print '\nno element x[0,0,0] in x: \n{}'.format(x)
        return np.nan
    return x-x+i

The expected result, res, from this function would remain the same shape but would satisfy the following:

print all([(res[i] == i*m**(n-1)).all() for i in range(m)])

This works with the default map() function,

res = np.asarray(map(f, a))
print all([(res[i] == i*m**(n-1)).all() for i in range(m)])
True

I would expect np.vectorize to work in the same way as map() but it acts in scalar entries:

res = np.vectorize(f)(a)

no element x[0,0,0] in x: 
0
...

Solution

Given that arr is 4d, and your fn works on 3d arrays,

np.asarray(map(func, arr))   # list(map(...)) for Py3

looks perfectly reasonable. I'd use the list comprehension form, but that's a matter of programming style

np.asarray([func(i) for i in arr])

for i in arr iterates on the first dimension of arr. In effect it treats arr as a list of the 3d arrays. And then it reassembles the resulting list into a 4d array.

np.vectorize doc could be more explicit about the function taking scalars. But yes, it passes values as scalars. Note that np.vectorize does not have provision for passing an iteration axis parameter. It's most useful when your function takes values from several array, something like

 [func(a,b) for a,b in zip(arrA, arrB)]

It generalizes the zip so allow for broadcasting. But otherwise it is an iterative solution. It knows nothing about the guts of your func, so it can't speed up its calls.

np.vectorize ends up calling np.frompyfunc, which being a bit less general is a bit faster. But it too passes scalars to the func.

np.apply_along/over_ax(e/i)s also iterate over one or more axes. You may find their code instructive, but I agree they don't apply here.

A variation on the map approach is to allocate the result array, and index:

In [45]: res=np.zeros_like(arr,int)
In [46]: for i in range(arr.shape[0]):
    ...:     res[i,...] = f(arr[i,...])

This may be easier if you need to iterate on a different axis than the 1st.

You need to do your own timings to see which is faster.

========================

An example of iteration over the 1st dimension with in-place modification:

In [58]: arr.__array_interface__['data']  # data buffer address
Out[58]: (152720784, False)

In [59]: for i,a in enumerate(arr):
    ...:     print(a.__array_interface__['data'])
    ...:     a[0,0,:]=i
    ...:     
(152720784, False)   # address of the views (same buffer)
(152720892, False)
(152721000, False)

In [60]: arr
Out[60]: 
array([[[[ 0,  0,  0],
         [ 3,  4,  5],
         [ 6,  7,  8]],

        ...

       [[[ 1,  1,  1],
         [30, 31, 32],
         ...

       [[[ 2,  2,  2],
         [57, 58, 59],
         [60, 61, 62]],
       ...]]])

When I iterate over an array, I get a view that starts at successive points on the common data buffer. If I modify the view, as above or even with a[:]=..., I modify the original. I don't have to write anything back. But don't use a = ...., which breaks the link to the original array.