Search code examples
pythonpython-multiprocessingdeque

multiprocessing iterator, filter what gets added to the deque


I am running a large comparison, but approximately 25% of the total run time is spent cleaning the deque after the comparisons are complete. My code looks something like this:

from collections import deque
from multiprocessing import Pool
from _map_workflow import val_comp

if __name__ == '__main__':
    pool = Pool()
    records = deque(pool.imap_unordered(val_comp, combinations(index_tups,2)))

    for _ in range(records.count(None)):
        records.remove(None)

The comparison function val_comp only returns values if certain criteria are met, but the deque gets loaded with None when nothing is returned. Since I'm the multiprocessing is using imap I am unsure how to filter what is getting added to the deque.

Is there a faster/more efficient way to remove these None's or prevent them from be adding in the first place?


Solution

  • .remove is an O(N) operation for deque objects.

    So, overall, if there are M None's, you have O(M*N) behavior.

    This is totally avoidable. One simple way is to use filter:

    records = deque(filter(None, pool.imap_unordered(val_comp, combinations(index_tups,2))))
    

    If you wanted to filter them out after you already have your records deque, you could do something like:

    records = deque(x for x in records if x is not None)
    

    Which creates a new deque.