Search code examples
pythonnumpyperformancefor-loopmatrix

How to speed up to assign values into a 3D matrix in python3?


I would like to make a much faster python3 program. Please give me some nice ideas.

Background

I am using python3 for visualizing a 3D dataset calculated from a Fortran90 program.

When I write the calculated 3D matrix out into a text file, I have to use the shape of 2D matrix in the program.

Its output structure is below:

      Value,  x, y, z

e.g.

   123443.0,  1, 1, 1
   123343.0,  1, 1, 2
   134554.0,  1, 1, 3

A value is an element of a 3D matrix. x, y, and z mean the positions of each element of a 3D matrix.

To read this file, I use this python3 code below.

input_path="/user_path/3D_CUBE_data.txt"

#READ DATASET
read_1=np.loadtxt(input_path)
read_2 = pd.DataFrame(read_1, columns=["data","x","y","z"])

print("input_shape=", read_2.shape)

#CREATE A 3D MATRIX
#(dis, tra, rows)
dis = 41
tra = 18
rows = 4096

data_1 = np.zeros( shape = (dis, tra, rows) )


x_l = list(range(1,dis+1))
y_l = list(range(1,tra+1))
z_l = list(range(1,rows+1))

#ASSIGN VALUES IN THE 3D MATRIX
for y in y_l :
   for x in x_l :
       line_1 = read_2[(read_2["x"]==int(x)) & (read_2["y"]==int(y))]

       data_1[x-1,y-1,:] = line_1.loc[:,["data"]].T

I think assigning values is slow. The reason may be the two for-loops.

Question

Therefore, my question is how to speed up this process in python3?


Solution

  • You could do the following, without the need to use a Pandas DataFrame, by using the NumPy ravel_multi_index function to convert your coordinates into the indices of a flattened version of your required matrix:

    # read in the comma separated values
    inputdata = np.loadtxt(input_path, delimiter=",")
    
    # extract the values and their coordinates
    values = inputdata[:, 0]
    coords = inputdata[:, 1:].astype(int) - 1  # subtract 1 due to indices starting at 0
    
    # create your matrix
    dis = 41
    tra = 18
    rows = 4096
    
    data_1 = np.zeros(shape=(dis, tra, rows))
    
    # fill in your matrix at the appropriate coordinates
    data_1.flat[np.ravel_multi_index(coords.T, data_1.shape)] = values