Search code examples
numpyscipysparse-matrixarray-broadcasting

ValueError: could not broadcast input array from shape (49041,4) into shape (49041)


This is my code. I'm getting broad cast error.I'm unable to understand why?I have looked at other similar questions, which spoke about problems with dimensions, but I was unable to find out the problem.Any help is appreciated. Thanks in advance. I have attached the image. Broadcast error

Both arrays ( ns and X_train_grade_encoded) are of the same shape , but there is error why?


Solution

  • So I looked at your notebook image. It is a small png that requires zoom to read. We strongly encourage, some even demand, that you copy-n-paste code and errors. We need to see the problem, right up front, not hidden. Otherwise we are likely to move to the next question.

    broadcast errors usually occur when doing some sort of math on two arrays, or when (my second guess) assigning one array to a slice of another. But this case is a more obscure one, trying to make an object dtype array from (n,4) and (n,300) shaped arrays.

    You are doing hstack((ns, array2)). With an ordinary np.hstack that would work and produce a (n, 304) shaped array. But you are using scipy.sparse.hstack. I don't know if that was intentional or a mistake. You haven't hinted that you are working the sparse matrices.

    ns probably was constructed from a sparse matrix, since you use toarray(). But it is now a dense (numpy) array.

    sparse.hstack is intended for sparse matrices, returning a sparse matrix. I don't know the exact limits on using dense array inputs. I believe it can convert dense to coo sparse and then do its join, but here the error occurred before it got to that step.


    This reproduces your error:

    In [37]: from scipy import sparse  
    

    Trying to use sparse hstack on two dense arrays:

    In [38]: sparse.hstack([np.ones((3,4)),np.zeros((3,2))])                        
    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    <ipython-input-38-a9d8036b5a44> in <module>
    ----> 1 sparse.hstack([np.ones((3,4)),np.zeros((3,2))])
    
    /usr/local/lib/python3.6/dist-packages/scipy/sparse/construct.py in hstack(blocks, format, dtype)
        463 
        464     """
    --> 465     return bmat([blocks], format=format, dtype=dtype)
        466 
        467 
    
    /usr/local/lib/python3.6/dist-packages/scipy/sparse/construct.py in bmat(blocks, format, dtype)
        543     """
        544 
    --> 545     blocks = np.asarray(blocks, dtype='object')
        546 
        547     if blocks.ndim != 2:
    
    /usr/local/lib/python3.6/dist-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
         83 
         84     """
    ---> 85     return array(a, dtype, copy=False, order=order)
         86 
         87 
    
    ValueError: could not broadcast input array from shape (3,4) into shape (3)
    

    But if we first convert one (even the 2nd) to sparse:

    In [39]: sparse.hstack([np.ones((3,4)),sparse.coo_matrix(np.zeros((3,2)))])     
    Out[39]: 
    <3x6 sparse matrix of type '<class 'numpy.float64'>'
        with 12 stored elements in COOrdinate format>
    In [40]: _.A                                                                    
    Out[40]: 
    array([[1., 1., 1., 1., 0., 0.],
           [1., 1., 1., 1., 0., 0.],
           [1., 1., 1., 1., 0., 0.]])
    

    of course the right way to join two dense arrays:

    In [41]: np.hstack([np.ones((3,4)),np.zeros((3,2))])                            
    Out[41]: 
    array([[1., 1., 1., 1., 0., 0.],
           [1., 1., 1., 1., 0., 0.],
           [1., 1., 1., 1., 0., 0.]])
    

    The array(...,object) error is a bit obscure; it arises because both arrays are dense and have the same first dimension. It's a known issue in numpy. Since sparse.hstack was intended for use on sparse matrices, its developers can be excused for ignoring this numpy misuse.

    ===

    sparse.vstack does work with dense arrays, with shapes like (3,4) and (5,4), because np.array(..., object) does make a valid object dtype array. But if the shapes match, e.g. (3,4) and (3,4), neither hstack nor vstack work, but the error message is different from yours.

    In [66]: sparse.hstack((np.ones((3,2)),np.zeros((3,2))))                        
    ...
    ValueError: blocks must be 2-D
    

    So we need to the take the docs seriously.