Search code examples
pythonnumpyvectorizationscientific-computing

Using `numpy.vectorize` to create multidimensional array results in ValueError: setting an array element with a sequence


This problem only seems to arise when my dummy function returns an array and thus, a multidimensional array is being created.

I reduced the issue to the following example:

def dummy(x):
    y = np.array([np.sin(x), np.cos(x)])
    return y

x = np.array([0, np.pi/2, np.pi])

The code I want to optimize looks like this:

y = []
for x_i in x:
    y_i = dummy(x_i)
    y.append(y_i)
y = np.array(y)

So I thought, I could use vectorize to get rid of the slow loop:

y = np.vectorize(dummy)(x)

But this results in

ValueError: setting an array element with a sequence.

Where even is the sequence, which the error is talking about?!


Solution

  • Your function returns an array when given a scalar:

    In [233]: def dummy(x):
         ...:     y = np.array([np.sin(x), np.cos(x)])
         ...:     return y
         ...: 
         ...: 
    In [234]: dummy(1)
    Out[234]: array([0.84147098, 0.54030231])
    
    
    
    In [235]: f = np.vectorize(dummy)
    In [236]: f([0,1,2])
    ...
    ValueError: setting an array element with a sequence.
    

    vectorize constructs a empty result array, and tries to put the result of each calculation in it. But a cell of the target array cannot accept an array.

    If we specify a otypes parameter, it does work:

    In [237]: f = np.vectorize(dummy, otypes=[object])
    In [238]: f([0,1,2])
    Out[238]: 
    array([array([0., 1.]), array([0.84147098, 0.54030231]),
           array([ 0.90929743, -0.41614684])], dtype=object)
    

    That is, each dummy array is put in a element of a shape (3,) result array.

    Since the component arrays all have the same shape, we can stack them:

    In [239]: np.stack(_)
    Out[239]: 
    array([[ 0.        ,  1.        ],
           [ 0.84147098,  0.54030231],
           [ 0.90929743, -0.41614684]])
    

    But as noted, vectorize does not promise a speedup. I suspect we could also use the newer signature parameter, but that's even slower.

    vectorize makes some sense if your function takes several scalar arguments, and you'd like to take advantage of numpy broadcasting when feeding sets of values. But as replacement for a simple iteration over a 1d array, it isn't an improvement.