I have a numpy array of shape (192, 224, 192, 1)
. The last dimension is the integer class that I would like to one hot encode. For example, if I have 12 classes I would like the of the resulting array to be (192, 224, 192, 12)
, with the last dimension being all zeros but a 1 at the index corresponding to the original value.
I can do this is naively with many for
loops, but would like to know if there is a better way to do this.
You can do this in a single indexing operation if you know the max. Given an array a
and m = a.max() + 1
:
out = np.zeros(a.shape[:-1] + (m,), dtype=bool)
out[(*np.indices(a.shape[:-1], sparse=True), a[..., 0])] = True
It's easier if you remove the unnecessary trailing dimension:
a = np.squeeze(a)
out = np.zeros(a.shape + (m,), bool)
out[(*np.indices(a.shape, sparse=True), a)] = True
The explicit tuple in the index is necessary to do star expansion.
If you want to extend this to an arbitrary dimension, you can do that too. The following will insert a new dimension into the squeezed array at axis
. Here axis
is the position in the final array of the new axis, which is consistent with say np.stack
, but not consistent with list.insert
:
def onehot(a, axis=-1, dtype=bool):
pos = axis if axis >= 0 else a.ndim + axis + 1
shape = list(a.shape)
shape.insert(pos, a.max() + 1)
out = np.zeros(shape, dtype)
ind = list(np.indices(a.shape, sparse=True))
ind.insert(pos, a)
out[tuple(ind)] = True
return out
If you have a singleton dimension to expand, the generalized solution can find the first available singleton dimension:
def onehot2(a, axis=None, dtype=bool):
shape = np.array(a.shape)
if axis is None:
axis = (shape == 1).argmax()
if shape[axis] != 1:
raise ValueError(f'Dimension at {axis} is non-singleton')
shape[axis] = a.max() + 1
out = np.zeros(shape, dtype)
ind = list(np.indices(a.shape, sparse=True))
ind[axis] = a
out[tuple(ind)] = True
return out
To use the last available singleton, replace axis = (shape == 1).argmax()
with
axis = a.ndim - 1 - (shape[::-1] == 1).argmax()
Here are some example usages:
>>> np.random.seed(0x111)
>>> x = np.random.randint(5, size=(3, 2))
>>> x
array([[2, 3],
[3, 1],
[4, 0]])
>>> a = onehot(x, axis=-1, dtype=int)
>>> a.shape
(3, 2, 5)
>>> a
array([[[0, 0, 1, 0, 0], # 2
[0, 0, 0, 1, 0]], # 3
[[0, 0, 0, 1, 0], # 3
[0, 1, 0, 0, 0]], # 1
[[0, 0, 0, 0, 1], # 4
[1, 0, 0, 0, 0]]] # 0
>>> b = onehot(x, axis=-2, dtype=int)
>>> b.shape
(3, 5, 2)
>>> b
array([[[0, 0],
[0, 0],
[1, 0],
[0, 1],
[0, 0]],
[[0, 0],
[0, 1],
[0, 0],
[1, 0],
[0, 0]],
[[0, 1],
[0, 0],
[0, 0],
[0, 0],
[1, 0]]])
onehot2
requires you to mark the dimension you want to add as a singleton:
>>> np.random.seed(0x111)
>>> y = np.random.randint(5, size=(3, 1, 2, 1))
>>> y
array([[[[2],
[3]]],
[[[3],
[1]]],
[[[4],
[0]]]])
>>> c = onehot2(y, axis=-1, dtype=int)
>>> c.shape
(3, 1, 2, 5)
>>> c
array([[[[0, 0, 1, 0, 0],
[0, 0, 0, 1, 0]]],
[[[0, 0, 0, 1, 0],
[0, 1, 0, 0, 0]]],
[[[0, 0, 0, 0, 1],
[1, 0, 0, 0, 0]]]])
>>> d = onehot2(y, axis=-2, dtype=int)
ValueError: Dimension at -2 is non-singleton
>>> e = onehot2(y, dtype=int)
>>> e.shape
(3, 5, 2, 1)
>>> e.squeeze()
array([[[0, 0],
[0, 0],
[1, 0],
[0, 1],
[0, 0]],
[[0, 0],
[0, 1],
[0, 0],
[1, 0],
[0, 0]],
[[0, 1],
[0, 0],
[0, 0],
[0, 0],
[1, 0]]])