I have an ndarray
, defined by arr
that is an n
-dimensional cube with length m
in each dimension.
I want to act a function, func
, by slicing along the dimension n=0
and taking each n-1
-dim slice as an input to the function.
This seems to work for map()
but I can't find a numpy
variant that is appropriate. np.vectorise
seems to split the n-1
-tensor into individual scalar entries. Neither apply_along_axis
or apply_over_axes
seem appropriate either.
My problem is such that I need to pass arbitrary functions as inputs so I do not see a solution with einsum
being feasible either.
numpy
alternative to using np.asarray(map(func, arr))
?I define an example array, arr
as a 4
-dim cube (or 4-tensor) by:
m, n = 3, 4
arr = np.arange(m**n).reshape((m,)*n)
I define an example function f
,
def f(x):
"""makes it obvious how the np.ndarray is being passed into the function"""
try: # perform an op using x[0,0,0] which is expected to exist
i = x[0,0,0]
except:
print '\nno element x[0,0,0] in x: \n{}'.format(x)
return np.nan
return x-x+i
The expected result, res
, from this function would remain the same shape but would satisfy the following:
print all([(res[i] == i*m**(n-1)).all() for i in range(m)])
This works with the default map()
function,
res = np.asarray(map(f, a))
print all([(res[i] == i*m**(n-1)).all() for i in range(m)])
True
I would expect np.vectorize
to work in the same way as map()
but it acts in scalar entries:
res = np.vectorize(f)(a)
no element x[0,0,0] in x:
0
...
Given that arr
is 4d, and your fn
works on 3d arrays,
np.asarray(map(func, arr)) # list(map(...)) for Py3
looks perfectly reasonable. I'd use the list comprehension form, but that's a matter of programming style
np.asarray([func(i) for i in arr])
for i in arr
iterates on the first dimension of arr
. In effect it treats arr
as a list of the 3d arrays. And then it reassembles the resulting list into a 4d array.
np.vectorize
doc could be more explicit about the function taking scalars. But yes, it passes values as scalars. Note that np.vectorize
does not have provision for passing an iteration axis parameter. It's most useful when your function takes values from several array, something like
[func(a,b) for a,b in zip(arrA, arrB)]
It generalizes the zip
so allow for broadcasting. But otherwise it is an iterative solution. It knows nothing about the guts of your func
, so it can't speed up its calls.
np.vectorize
ends up calling np.frompyfunc
, which being a bit less general is a bit faster. But it too passes scalars to the func.
np.apply_along/over_ax(e/i)s
also iterate over one or more axes. You may find their code instructive, but I agree they don't apply here.
A variation on the map approach is to allocate the result array, and index:
In [45]: res=np.zeros_like(arr,int)
In [46]: for i in range(arr.shape[0]):
...: res[i,...] = f(arr[i,...])
This may be easier if you need to iterate on a different axis than the 1st.
You need to do your own timings to see which is faster.
========================
An example of iteration over the 1st dimension with in-place modification:
In [58]: arr.__array_interface__['data'] # data buffer address
Out[58]: (152720784, False)
In [59]: for i,a in enumerate(arr):
...: print(a.__array_interface__['data'])
...: a[0,0,:]=i
...:
(152720784, False) # address of the views (same buffer)
(152720892, False)
(152721000, False)
In [60]: arr
Out[60]:
array([[[[ 0, 0, 0],
[ 3, 4, 5],
[ 6, 7, 8]],
...
[[[ 1, 1, 1],
[30, 31, 32],
...
[[[ 2, 2, 2],
[57, 58, 59],
[60, 61, 62]],
...]]])
When I iterate over an array, I get a view that starts at successive points on the common data buffer. If I modify the view, as above or even with a[:]=...
, I modify the original. I don't have to write anything back. But don't use a = ....
, which breaks the link to the original array.