I have an array indexes
that for each row contains the columns that should be filled. For example:
[array([[ 2, 14098, 6824, 24207, 1215],
[ 51, 1277, 3197, 1052, 4076],......
And I have another array values
containing the values that should be filled in those positions. For example:
array([[1, 7, 75, 82, 11],
[11, 5, 8, 82, 811],...
This means that for row 0, column 2 should be filled with value '1', column 14098 should be filled with value '7'... for row 1, column 51 should be filled with value '11', column 1277 should be filled with value '5'...
And a third array, a = np.zeros((100000, 100000))
that is the array to be filled given the two previous arrays.
I am using right now a nested loop to do it but I'm pretty sure that there is a better way to do it:
for row_idx in range(indexes.shape[0]):
for col_idx in range(indexes.shape[1]):
column = indexes[row_idx][col_idx]
a[row_idx][indexes[row_idx][col_idx]] = values[row_idx][col_idx]
How can I fill the array using python/numpy (fancy indexing, broadcasting...) style? What is the most memory-efficient way to do it since I have limited ram?
Thanks for your help in advance!
This can be done with np.put_along_axis
Put values into the destination array by matching 1d index and data slices.This iterates over matching 1d slices oriented along the specified axis in the index and data arrays, and uses the former to place values into the latter. These slices can be different lengths.
See this for an example, taken from here
In [50]: df
Out[50]:
datetime1 datetime2 datetime3 datetime4
1 5 6 5 5
2 7 2 3 5
3 4 2 3 2
4 6 4 4 7
5 7 3 8 9
In [51]: index_arr = np.array([3, 2, 0 ,1 ,2])
In [52]: replace_arr = np.array([14, 12, 23, 17 ,15])
In [53]: np.put_along_axis(df.to_numpy(),index_arr[:,None],replace_arr[:,None],axis=1)
In [54]: df
Out[54]:
datetime1 datetime2 datetime3 datetime4
1 5 6 5 14
2 7 2 12 5
3 23 2 3 2
4 6 17 4 7
5 7 3 15 9
As you can see for example, the value of df[0][3]
was changed from 5
to 14
and applying this same logic will work fine for your problem.