Search code examples
pythonmultiprocessingpython-multiprocessing

Double or More Multiprocessing?


I am working through some files that are within a folder hierarchy of Years>Months>Days (where days are the actual files I'm operating on).

Right now I am parallelizing at the day level, so doing eight files at once, but I'm wondering if it is possible to do some outer parallelization too over months and even years? Can I do something like:

pool = Pool()
pool.starmap(convertYears, years)

Then within convertYears function,

pool = Pool()
pool.starmap(convertMonths, months)

Then within convertMonths function,

pool = Pool()
pool.starmap(convertDays, files)

I don't know much about how the parallelization works, so that is why I'm asking here.


Solution

  • Yes, this is possible, but Pool by default uses the maximum number of available processors, which means the maximum number of tasks that can be accomplished at once. That means if you have 8 processors available convertYears will open 8 at once, then each of those will open 8 (now to a total of 64) when they start convertMonths and each of those will will open 8 (now to a grand total of 512 processes) and all of those will come with the time and memory overhead of creating new processes, all while only 8 will be running anyway. Even if you use Pool(2) to limit each level, you still end up with 8 processes at once, which is exactly where you started.

    Long story short, the program will be just as fast and your CPU won't hate you if you just stick to parallelizing at one level.