This problem only seems to arise when my dummy
function returns an array and thus, a multidimensional array is being created.
I reduced the issue to the following example:
def dummy(x):
y = np.array([np.sin(x), np.cos(x)])
return y
x = np.array([0, np.pi/2, np.pi])
The code I want to optimize looks like this:
y = []
for x_i in x:
y_i = dummy(x_i)
y.append(y_i)
y = np.array(y)
So I thought, I could use vectorize
to get rid of the slow loop:
y = np.vectorize(dummy)(x)
But this results in
ValueError: setting an array element with a sequence.
Where even is the sequence, which the error is talking about?!
Your function returns an array when given a scalar:
In [233]: def dummy(x):
...: y = np.array([np.sin(x), np.cos(x)])
...: return y
...:
...:
In [234]: dummy(1)
Out[234]: array([0.84147098, 0.54030231])
In [235]: f = np.vectorize(dummy)
In [236]: f([0,1,2])
...
ValueError: setting an array element with a sequence.
vectorize
constructs a empty result array, and tries to put the result of each calculation in it. But a cell of the target array cannot accept an array.
If we specify a otypes
parameter, it does work:
In [237]: f = np.vectorize(dummy, otypes=[object])
In [238]: f([0,1,2])
Out[238]:
array([array([0., 1.]), array([0.84147098, 0.54030231]),
array([ 0.90929743, -0.41614684])], dtype=object)
That is, each dummy
array is put in a element of a shape (3,) result array.
Since the component arrays all have the same shape, we can stack
them:
In [239]: np.stack(_)
Out[239]:
array([[ 0. , 1. ],
[ 0.84147098, 0.54030231],
[ 0.90929743, -0.41614684]])
But as noted, vectorize
does not promise a speedup. I suspect we could also use the newer signature
parameter, but that's even slower.
vectorize
makes some sense if your function takes several scalar arguments, and you'd like to take advantage of numpy broadcasting when feeding sets of values. But as replacement for a simple iteration over a 1d array, it isn't an improvement.