I am running a large comparison, but approximately 25% of the total run time is spent cleaning the deque after the comparisons are complete. My code looks something like this:
from collections import deque
from multiprocessing import Pool
from _map_workflow import val_comp
if __name__ == '__main__':
pool = Pool()
records = deque(pool.imap_unordered(val_comp, combinations(index_tups,2)))
for _ in range(records.count(None)):
records.remove(None)
The comparison function val_comp
only returns values if certain criteria are met, but the deque gets loaded with None
when nothing is returned. Since I'm the multiprocessing is using imap
I am unsure how to filter what is getting added to the deque.
Is there a faster/more efficient way to remove these None
's or prevent them from be adding in the first place?
.remove
is an O(N) operation for deque
objects.
So, overall, if there are M None's, you have O(M*N) behavior.
This is totally avoidable. One simple way is to use filter
:
records = deque(filter(None, pool.imap_unordered(val_comp, combinations(index_tups,2))))
If you wanted to filter them out after you already have your records
deque, you could do something like:
records = deque(x for x in records if x is not None)
Which creates a new deque
.