Search code examples
pythonnumpymatrixone-hot-encoding

Encode an One-Hot Encoded Matrix to Single Array Integer Encoded


I have an existing matrix:

array([[0, 1, 0, ..., 0, 1, 0],
       [0, 0, 1, ..., 0, 0, 1],
       [1, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 1, 0, 0]])

I want to encode these lines so I get a (1, columns) array with the integer corresponding to which line had value == 1

Expected Output

[3, 1, 2, ..., 4, 1, 2]

Edge Cases (plus)

If you want to help a little more, I may have cases in which the matrix is the following:

array([[0, 1, 0, ..., 0, 1, 0],
       [0, 0, 0, ..., 0, 0, 1],
       [1, 0, 0, ..., 0, 0, 0],
       [0, 1, 0, ..., 1, 0, 0]])

Where you can see that columns 1 and 2 (indexing from 0) have 2 values 1 or simply don't have any value.

In these cases, what I hoped for it to do is:

  • Two lines filled -> Return a new value for that combination (e.g. 5,6,7)
  • No lines filled -> Return a new value for that (e.g. 0)

Solution

  • For the first question:

    np.argmax(arr, axis=0) + 1
    

    For the extended question:

    def get_idx(x):
        ret = np.where(x)
        return ret[0] + 1 if len(ret[0]) else np.array([0])
    
    [get_idx(a[:,i]) for i in range(a.shape[1])]
    
    # out:
    # [array([3]), array([1, 4]), array([0]), array([4]), array([1]), array([2])]