Search code examples
pythonarraysnumpy

Can numpy bincount work with 2D arrays?


I am seeing behaviour with numpy bincount that I cannot make sense of. I want to bin the values in a 2D array in a row-wise manner and see the behaviour below. Why would it work with dbArray but fail with simarray?

>>> dbArray
array([[1, 0, 1, 0, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 0, 1, 1],
       [1, 0, 0, 0, 0],
       [0, 0, 0, 1, 1],
       [0, 1, 0, 1, 0]])
>>> N.apply_along_axis(N.bincount,1,dbArray)
array([[2, 3],
       [0, 5],
       [1, 4],
       [4, 1],
       [3, 2],
       [3, 2]], dtype=int64)
>>> simarray
array([[2, 0, 2, 0, 2],
       [2, 1, 2, 1, 2],
       [2, 1, 1, 1, 2],
       [2, 0, 1, 0, 1],
       [1, 0, 1, 1, 2],
       [1, 1, 1, 1, 1]])
>>> N.apply_along_axis(N.bincount,1,simarray)

Traceback (most recent call last):
  File "<pyshell#31>", line 1, in <module>
    N.apply_along_axis(N.bincount,1,simarray)
  File "C:\Python27\lib\site-packages\numpy\lib\shape_base.py", line 118, in apply_along_axis
    outarr[tuple(i.tolist())] = res
ValueError: could not broadcast input array from shape (2) into shape (3)

Solution

  • The problem is that bincount isn't always returning the same shaped objects, in particular when values are missing. For example:

    >>> m = np.array([[0,0,1],[1,1,0],[1,1,1]])
    >>> np.apply_along_axis(np.bincount, 1, m)
    array([[2, 1],
           [1, 2],
           [0, 3]])
    >>> [np.bincount(m[i]) for i in range(m.shape[1])]
    [array([2, 1]), array([1, 2]), array([0, 3])]
    

    works, but:

    >>> m = np.array([[0,0,0],[1,1,0],[1,1,0]])
    >>> m
    array([[0, 0, 0],
           [1, 1, 0],
           [1, 1, 0]])
    >>> [np.bincount(m[i]) for i in range(m.shape[1])]
    [array([3]), array([1, 2]), array([1, 2])]
    >>> np.apply_along_axis(np.bincount, 1, m)
    Traceback (most recent call last):
      File "<ipython-input-49-72e06e26a718>", line 1, in <module>
        np.apply_along_axis(np.bincount, 1, m)
      File "/usr/local/lib/python2.7/dist-packages/numpy/lib/shape_base.py", line 117, in apply_along_axis
        outarr[tuple(i.tolist())] = res
    ValueError: could not broadcast input array from shape (2) into shape (1)
    

    won't.

    You could use the minlength parameter and pass it using a lambda or partial or something:

    >>> np.apply_along_axis(lambda x: np.bincount(x, minlength=2), axis=1, arr=m)
    array([[3, 0],
           [1, 2],
           [1, 2]])