Search code examples
numpymatrixvectornumbahstack

cannot stack numpy arrays with hstack in numba


I have one matrix mat of the type

array([[0.00000000e+00, 1.98300000e+03, 1.57400000e+00, ...,
                   nan,            nan, 2.38395652e+00],
       [0.00000000e+00, 1.98400000e+03, 1.80600000e+00, ...,
                   nan, 1.38395652e+00, 2.29417391e+00],
       [0.00000000e+00, 1.98500000e+03, 4.72400000e+00, ...,
        1.38395652e+00, 1.29417391e+00, 5.68147826e+00],
       ...,
       [9.87500000e+03, 1.99200000e+03, 1.59700000e+00, ...,
                   nan,            nan, 4.61641176e+00],
       [9.87500000e+03, 1.99300000e+03, 3.13400000e+00, ...,
                   nan, 3.61641176e+00, 5.45824421e+00],
       [9.87500000e+03, 1.99400000e+03, 7.61900000e+00, ...,
        3.61641176e+00, 4.45824421e+00, 1.05298571e+01]])

with dimensions (107196, 46) and one vector vec of the type

array([0.23, 0., 0.28, ..., 0.99, 1.0, 0.05])

with dimensions (107196,). I want to use the np.hstack function to stack vec vertically as last column of mat. For this purpose, I use np.hstack((mat,vec[:,None])) Now, I want to get a njitted function which contains this operation. Say

@jit(nopython=True)
def simul_nb(matrix, vector):
    return np.hstack((matrix,vector[:,None]))

However, when I run simul_nb(mat,vec) I get the following error:

Traceback (most recent call last):

  File "<ipython-input-340-6c5341efa9b6>", line 1, in <module>
    simul_nb(income_df,nols)

  File "C:\Users\bagna\anaconda3\lib\site-packages\numba\core\dispatcher.py", line 415, in _compile_for_args
    error_rewrite(e, 'typing')

  File "C:\Users\bagna\anaconda3\lib\site-packages\numba\core\dispatcher.py", line 358, in error_rewrite
    reraise(type(e), e, None)

  File "C:\Users\bagna\anaconda3\lib\site-packages\numba\core\utils.py", line 80, in reraise
    raise value.with_traceback(tb)

TypingError: No implementation of function Function(<built-in function getitem>) found for signature:
 
getitem(readonly array(float64, 1d, C), Tuple(slice<a:b>, none))
 
There are 16 candidate implementations:
      - Of which 14 did not match due to:
      Overload of function 'getitem': File: <numerous>: Line N/A.
        With argument(s): '(readonly array(float64, 1d, C), Tuple(slice<a:b>, none))':
       No match.
      - Of which 2 did not match due to:
      Overload in function 'GetItemBuffer.generic': File: numba\core\typing\arraydecl.py: Line 162.
        With argument(s): '(readonly array(float64, 1d, C), Tuple(slice<a:b>, none))':
       Rejected as the implementation raised a specific error:
         TypeError: unsupported array index type none in Tuple(slice<a:b>, none)
  raised from C:\Users\bagna\anaconda3\lib\site-packages\numba\core\typing\arraydecl.py:68

During: typing of intrinsic-call at <ipython-input-339-4d5037c266b7> (3)
During: typing of static-get-item at <ipython-input-339-4d5037c266b7> (3)

How do I make the function work?


Solution

  • Numba doesn't understand the [:,None]indexing for reshaping. Indeed, the latter is equivalent to [:,np.newaxis], as you may already know, and at the present time, np.newaxis isn't a supported numpy features, which partly explains the error message. Here, you should just use vector.reshape((-1,1)) or np.expand_dims(vector,1)) instead, which should give:

    @jit(nopython=True)
    def simul_nb(matrix, vector):
        return np.hstack((matrix,vector.reshape((-1,1))))
    
    >>> new_mat = simul_nb(matrix, vector)
    >>> new_mat.shape
    >>> (107196, 47)