I'm new to Python and getting used to these really handy implicit array/list actions, so please bear with me. I've completed a proof-of-concept code (120 combinations), but as expected it is experiencing a significant slowdown when working against the full dataset (4 million combinations). The current slowdown is in the following for loop:
for i in Combinations:
PropList = function.CalculateTotalPoints(ItemDict, i)
CombinationSets.loc[Combinations.index(i), 'Density':] = PropList
I'm trying not to make this a wall of text, but provide enough information to understand the goal.
I'm sure there are plenty of "not great Python" things happening here, but my current thoughts are:
Edit 1: I couldn't figure out how to isolate the code enough to make a small example problem. Not ideal for post longevity, but I've added the main code up to the for loop, support functions, and sample data to a Github Gist (Removed)
Edit 2: From what I've been reading elsewhere it seems that generally for loops are not the preferred method of looping in Python. I haven't successfully been able to vectorize my function. However, I did look into the timing for this for loop, the function (has a for loop inside), and the assignment of PropList. When running a small (120 combinations) or a large (4.5 million) set, the for loop and the function have similar (pretty fast) performance. In both cases, the line below is taking the longest. For the large data set, it takes ~1 second.
CombinationSets.loc[Combinations.index(i), 'Density':] = PropList
Edit 3:
I was able to reduce the time for the loop operation by instead of performing the merging operation CombinationSets.loc[ ...
inside the loop, I made a list of the returned PropList
. Once exiting the for loop, I performed the merge operation all at once. As it is currently set up a single run through the for loop is ~0.004s vs. 1 sec. There are likely additional optimazation that could be done, but I'll save that for a separate post.
Move the list to dataframe merge operation outside of for loop. See Edit 3.