I have a computation that in which I need go through items of a 3d numpy array and add them to the values in the second dimension of the array (skipping the values in that dimension). It is analogous to this canonical mimimal reproduction example:
import numpy as np
data = np.array([
[[1, 1, 1], [10, 10, 10], [1, 1, 1]],
[[2, 2, 2], [20, 20, 20], [2, 2, 2]],
[[3, 3, 3], [30, 30, 30], [3, 3, 3]] ])
def process_data(const_idx, data, i, j, k):
if const_idx != j:
# PROBLEM: how can I access this value if this function is vectorized?
value_to_add = data[i][const_idx][k]
data[i][j][k] += value_to_add
const_idx = 1
for i in range(data.shape[0]):
for j in range(data.shape[1]):
for k in range(data.shape[2]):
process_data(const_idx, data, i, j, k)
print(data)
Where the expected output in this case would be:
[[[11 11 11]
[10 10 10]
[11 11 11]]
[[22 22 22]
[20 20 20]
[22 22 22]]
[[33 33 33]
[30 30 30]
[33 33 33]]]
The code above works but it is very slow for large arrays. I would like to vectorize this function.
My first stab is something like this:
def process_data(val, data, const_idx):
# PROBLEM: How can I access this value given that I do not have access to the i / j / k coordinates val came from?
value_to_add = ...
# PROBLEM: I cannot make this check either since I dont know the j index being processed here
if const_idx != j:
return val + value_to_add
else:
return val
vfunc = np.vectorize(process_data)
result = vfunc(data, data, const_idx)
print(result)
How can I accomplish this, or is perhaps vectorization not the answer?
const_idx
points to the index of the row which acts as an addition factor.
You can shortly perform the inplace addition on the needed dimensions with the following approach:
def add_by_idx(arr, idx):
r = np.arange(arr.shape[1]) # row indices
arr[:, r[r != idx], :] += arr[:, [idx], :]
add_by_idx(data, 1)
print(data)
[[[11 11 11]
[10 10 10]
[11 11 11]]
[[22 22 22]
[20 20 20]
[22 22 22]]
[[33 33 33]
[30 30 30]
[33 33 33]]]