Search code examples
pythonmultiprocessingpython-multiprocessinghpc

Python Multiprocessing Parallelize Inner or Outer Loop


Let's say we have some operation like:

groups = ['A','B','C']
idx = [n for n in range(1000)]
for group in groups:
    for i in idx:
        # Compute something

where idx is much larger than groups.

To speed this up, I have looked at multiprocessing and joblib in Python. However, should we parallelize over the outer loop (split the for group in groups logic into parallel), or parallelize over the inner loop (split the for i in idx logic into parallel)?


Solution

  • This wildly depends on the number of groups, number of cores, heaviness of the actual computation and probably several other factors I'm forgetting. You can avoid having to think about this by creating a single iterator that produces all the tuples of (group, i) that appear in the inner loop, i.e. collapse the two loops into one. This can be done with itertools' (cross) product:

    Rough example:

    from itertools import product
    from multiprocessing import Pool
    with Pool() as p:
        p.map(compute_something, product(groups, idx)))
    

    This should work decently well in most situations.