Search code examples
numpymultidimensional-arrayvectorizationnumpy-ndarray

numpy: How to vectorize a function returning arrays of different shapes?


I need to vectorize a function that can return arrays of either shapes (2,3), or (3,3). Is it possible?

I create my vectorized function like this:

my_func_v = np.vectorize(my_func, signature='()->(n,m)')

And as long as the function returns only (2,3) or (3,3) arrays, it works well. But as soon as result shapes are mixed, numpy fails with the error:

ValueError: could not broadcast input array from shape (3,3) into shape (2,3)

Is it possible to mix returned shapes from a vectorized function?


Solution

  • I probably shouldn't take the time since you haven't provided a minimal working example. But let me illustrate signature:

    In [190]: f = np.vectorize(lambda x: x * np.ones((2, 3), int), signature="()->(n,m)")
    In [191]: f(np.arange(4))
    Out[191]: 
    array([[[0, 0, 0],
            [0, 0, 0]],
    
           [[1, 1, 1],
            [1, 1, 1]],
    
           [[2, 2, 2],
            [2, 2, 2]],
    
           [[3, 3, 3],
            [3, 3, 3]]])
    

    The return has the shape of the argument plus a (n,m). And obviously for a numeric array those last 2 dimensions can't be mixed. The above produces (4,2,3), below (2,2,2,3)

    In [192]: f(np.arange(4).reshape(2,2))
    Out[192]: 
    array([[[[0, 0, 0],
             [0, 0, 0]],
    
            [[1, 1, 1],
             [1, 1, 1]]],
    
    
           [[[2, 2, 2],
             [2, 2, 2]],
    
            [[3, 3, 3],
             [3, 3, 3]]]])
    

    If I drop the signature, and specify object return type:

    In [194]: f = np.vectorize(lambda x: x * np.ones((2, 3), int), otypes=['object'])
    In [195]: f(np.arange(4).reshape(2, 2))
    Out[195]: 
    array([[array([[0, 0, 0],
                   [0, 0, 0]]), array([[1, 1, 1],
                                       [1, 1, 1]])],
           [array([[2, 2, 2],
                   [2, 2, 2]]), array([[3, 3, 3],
                                       [3, 3, 3]])]], dtype=object)
    

    Now the element arrays shapes can vary.

    Generally I discourage the use of vectorize since it isn't real "vectorization". It has a clear performance disclaimer. And a object dtype array is little better, and in some ways worse, than a list.

    In [196]: timeit f(np.arange(1000))
    9.45 ms ± 67.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    In [197]: timeit [x*np.ones((2,3),int) for x in range(1000)]
    9.46 ms ± 290 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    In [198]: timeit np.array([x*np.ones((2,3),int) for x in range(1000)])
    9.83 ms ± 20.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    

    It used to be that np.vectorize was slower than the equivalent list comprehension. Now it is still slower for small arguments, but it scales better.