Search code examples
pythonnumpynumpy-ndarrayarray-broadcasting

Why does this error when converting a python list of lists to a Numpy array only occur in specific circumstances?


I have a somewhat peculiar structure of python list of lists that I need to convert to a numpy array, so far I have managed to simply get by using np.array(myarray, dtype = object), however a seemingly insignificant change to the structure of myarray has caused me to get an error.

I have managed to reduce my issue down into two lines of code, the following is what I was using previously and works exactly how I want it to:

import numpy as np
myarray = [np.array([[1,2,3,4],[5,6,7,8]]), np.array([[9,10],[11,12]]), np.array([[13,14],[15,16],[17,18]])]
np.array(myarray,dtype = object)

However, simply removing the last [17,18] array we have

import numpy as np
myarray = [np.array([[1,2,3,4],[5,6,7,8]]), np.array([[9,10],[11,12]]), np.array([[13,14],[15,16]])]
np.array(myarray,dtype = object)

Which gives "ValueError: could not broadcast input array from shape (2,4) into shape (2,)" when it attempts to run the second line.

It seems to me that this only happens when the arrays all have the same length but the underlying lists have different lengths, what I don't understand is why setting dtype = object doesnt cover this especially considering it handles the more complicated list of lists shape.


Solution

  • np.array tries, as first priority, to make a n-d numeric array - one where all elements are numeric, and the shape is consistent in all dimensions. i.e. no 'ragged' array.

    In [36]: alist = [np.array([[1,2,3,4],[5,6,7,8]]), 
    np.array([[9,10],[11,12]]), np.array([[13,14],[15,16],[17,18]])]
        
    In [38]: [a.shape for a in alist]
    Out[38]: [(2, 4), (2, 2), (3, 2)]
    

    alist works making a 3 element array of arrays.

    Your problem case:

    In [39]: blist = [np.array([[1,2,3,4],[5,6,7,8]]), np.array([[9,10],[11,12]]), np.array([[13,14],[15,16]])]
    
    In [40]: [a.shape for a in blist]
    Out[40]: [(2, 4), (2, 2), (2, 2)]
    

    Note that all subarrays have the same first dimension. That's what's giving the problem.

    The safe way to make such an array is to start with a 'dummy' of the right shape, and fill it:

    In [41]: res = np.empty(3,object); res[:] = blist; res
    Out[41]: 
    array([array([[1, 2, 3, 4],
                  [5, 6, 7, 8]]), array([[ 9, 10],
                                         [11, 12]]), array([[13, 14],
                                                            [15, 16]])],
          dtype=object)
    
    In [42]: res = np.empty(3,object); res[:] = alist; res
    Out[42]: 
    array([array([[1, 2, 3, 4],
                  [5, 6, 7, 8]]), array([[ 9, 10],
                                         [11, 12]]), array([[13, 14],
                                                            [15, 16],
                                                            [17, 18]])],
          dtype=object)
    

    It also works when all subarrays/lists have the same shape

    In [43]: clist = [np.array([[1,2],[7,8]]), np.array([[9,10],[11,12]]), np.array([[13,14],[15,16]])]
    
    In [44]: res = np.empty(3,object); res[:] = clist; res
    Out[44]: 
    array([array([[1, 2],
                  [7, 8]]), array([[ 9, 10],
                                   [11, 12]]), array([[13, 14],
                                                      [15, 16]])],
          dtype=object)
    

    Without that clist produces a (3,2,2) array of number objects:

    In [45]: np.array(clist, object)
    Out[45]: 
    array([[[1, 2],
            [7, 8]],
    
           [[9, 10],
            [11, 12]],
    
           [[13, 14],
            [15, 16]]], dtype=object)
    

    One way to think of it, np.array does not give you a way of specifying the 'depth' or 'shape' of object array. It has to 'guess', and in some cases guesses wrong.