Search code examples
pythonarraysnumpymatrix-indexing

Fill values into an array given indexes array and values array in Numpy


I have an array indexes that for each row contains the columns that should be filled. For example:

[array([[    2, 14098,  6824, 24207,  1215],
   [   51,  1277,  3197,  1052,  4076],......

And I have another array values containing the values that should be filled in those positions. For example:

array([[1, 7, 75, 82, 11],
       [11, 5, 8, 82, 811],...

This means that for row 0, column 2 should be filled with value '1', column 14098 should be filled with value '7'... for row 1, column 51 should be filled with value '11', column 1277 should be filled with value '5'...

And a third array, a = np.zeros((100000, 100000)) that is the array to be filled given the two previous arrays.

I am using right now a nested loop to do it but I'm pretty sure that there is a better way to do it:

for row_idx in range(indexes.shape[0]):
    for col_idx in range(indexes.shape[1]):
        column = indexes[row_idx][col_idx]
        a[row_idx][indexes[row_idx][col_idx]] = values[row_idx][col_idx]

How can I fill the array using python/numpy (fancy indexing, broadcasting...) style? What is the most memory-efficient way to do it since I have limited ram?

Thanks for your help in advance!


Solution

  • This can be done with np.put_along_axis

    Put values into the destination array by matching 1d index and data slices.This iterates over matching 1d slices oriented along the specified axis in the index and data arrays, and uses the former to place values into the latter. These slices can be different lengths.

    See this for an example, taken from here

    In [50]: df
    Out[50]: 
       datetime1  datetime2  datetime3  datetime4
    1          5          6          5          5
    2          7          2          3          5
    3          4          2          3          2
    4          6          4          4          7
    5          7          3          8          9
    
    In [51]: index_arr = np.array([3, 2, 0 ,1 ,2])
    
    In [52]: replace_arr = np.array([14, 12, 23, 17 ,15])
    
    In [53]: np.put_along_axis(df.to_numpy(),index_arr[:,None],replace_arr[:,None],axis=1)
    
    In [54]: df
    Out[54]: 
       datetime1  datetime2  datetime3  datetime4
    1          5          6          5         14
    2          7          2         12          5
    3         23          2          3          2
    4          6         17          4          7
    5          7          3         15          9
    

    As you can see for example, the value of df[0][3] was changed from 5 to 14 and applying this same logic will work fine for your problem.