Search code examples
pythonperformancecplexdocplex

Python: Is there a faster way to filter on dataframe in a for loop


I have a for loop over a tuple in python which is index for one dataframe. All the values under the index are then summed over.

Example:

t3 = time()
for row in list_of_index:
    a=[]
    if row[-1]!='end':
        a=df1.loc[row[:7]].to_numpy()
    b=df2.loc[row[:6]].to_numpy()
    print(solver.add_constraint(solver.sum(dvSupportedOuts[row[0], row[6],row[1], 
 row2[0],row[2],row2[1],row2[2],row[3],row[4] ,row2[3],row[5],row2[4],row2[5],row2[6] ,row2[7],row2[8]] 
                                     for row2 in np.reshape(a, (-1, 11))) 
        ==                solver.sum(dvSupportedOuts[row[0], row2[0], row[1], 
                                     row[2], row2[1],row[3],row[4],row2[2],row2[3],row[5],row2[4],row2[5],row2[6],row2[7],row2[8],row2[9]] 
                                     for row2 in np.reshape(b, (-1, 12)))#size of step_sequence
        + dvMBDown[row[0], row[6], row[1], row[2],row[3],row[4],row[5]]
        - dvMBUp[row[0], row[6], row[1], row[2],row[3],row[4],row[5]]))
print(area_name+" mb:", round((time()-t3), 1), "s")

This is the constraint I am making and it returns more than 100k constraints. It takes 40 mins to run this code while threading. The filter helps to return less data for loop running in solver.sum(). Is there a faster way to filter values inside the for loop? (This is being done for docplex constraint creation) Thanks


Solution

  • I was able to make it 18x faster, the main time was spent on the filter (.loc()). First I updated my index from multi- Index to single index which is concatenate of all the index columns. This lead to 3x speed improvement. Then I changed my dataframe to dictionary and it lead to total speed up of 15x. Now I am able to create the model constraint in 3 min instead of 45 mins. Moreover I am already threading (parallel processing) this over 1 of the index. Usually I have not seen significant improvement of speed using add_constrains() but sometimes it helps. 50k constraints form data with 600k rows in 3 mins If required I can share the logic