I have one-hot encoded data of undefined shape within an array of ndim = 3
, e.g.,:
import numpy as np
arr = np.array([ # Axis 0
[ # Axis 1
[0, 1, 0], # Axis 2
[1, 0, 0],
],
[
[0, 0, 1],
[0, 1, 0],
],
])
What I want is to shuffle values for a known fraction of sub-arrays along axis=2
.
If this fraction is 0.25
, then the result could be:
arr = np.array([
[
[1, 0, 0], # Shuffling happened here
[1, 0, 0],
],
[
[0, 0, 1],
[0, 1, 0],
],
])
I know how to do that using iterative methods like:
for i in range(arr.shape[0]):
for j in range(arr.shape[1]):
if np.random.choice([0, 1, 2, 3]) == 0:
np.random.shuffle(arr[i][j])
But this is extremely inefficient.
Edit: as suggested in the comments, the random selection of a known fraction should follow an uniform law.
One approach:
import numpy as np
np.random.seed(42)
fraction = 0.25
total = arr.shape[0] * arr.shape[1]
# pick arrays to be shuffled
indices = np.random.choice(np.arange(total), size=int(total * fraction), replace=False)
# convert the each index to the corresponding multi-index
multi_indices = np.unravel_index(indices, arr.shape[:2])
# create view using multi_indices
selected = arr[multi_indices]
# shuffle select by applying argsort on random values of the same shape
shuffled = np.take_along_axis(selected, np.argsort(np.random.random(selected.shape), axis=1), axis=1)
# set the array to the new values
arr[multi_indices] = shuffled
print(arr)
Output (of a single run)
[[[0 1 0]
[0 0 1]]
[[0 0 1]
[0 1 0]]]