Search code examples
pythonpython-multiprocessingpool

Multiprocessing (Pool) on a function with two arguments


So I have a function which does some stuff to several .txt files by taking only two arguments. It is currently working as intended, but I got through something like 10% of my stuff in almost one hour - so it takes some time, since the .txt files are fairly large.

Now, I have then been reading about the package multiprocessing, and especially the Pool segment of this. However, I am not quite sure how I use it properly.

The code I use to run my function is the following:

for k, structure in enumerate(structures):
    structure_maker(structure_path, structure)

The structure_path is always the same, whereas the structures is a list of different values, e.g.:

structures = [1, 3, 6, 7, 8, 10, 13, 25, 27]

So how would I go about using the Pool process on this ? As far as I can read I have to do something like:

from multiprocessing import Pool

mypool = Pool(6) # Choosing how many cores I want to use
mypool.map(structure_maker, list)

And the list is where I get lost. What is that supposed to be ? The structures list, and if so, where do I put in my structure_path ?


Solution

  • The Pool.map() function works like the built-in map() function does, in other words it applies a function passed to it as an argument to the each of the items in an iterable passed to it as a second argument. Each time it calls the supplied function, it provides the next item in the iterable as a single argument to the function.

    A potential problem in this case is that the function you want to use, structure_maker(), requires two arguments. There are different ways around this, but in this case, since one of the arguments is always the same thing, you could use the functools.partial() function to create temporary function that only requires the second argument to be passed to it—and you can do it right in the mypool.map() call.

    Here's what I mean:

    from multiprocessing import Pool
    
    def structure_maker(structure_path, structure):
        """ Dummy for testing. """
        return structure_path, structure
    
    if __name__ == '__main__':
    
        from pprint import pprint
        from functools import partial
    
        mypool = Pool(4)
        structure_path = 'samething'
        structures = [1, 3, 6, 7, 8, 10, 13, 25, 27]
        results = mypool.map(partial(structure_maker, structure_path), structures)
        pprint(results)
    

    Output:

    [('samething', 1),
     ('samething', 3),
     ('samething', 6),
     ('samething', 7),
     ('samething', 8),
     ('samething', 10),
     ('samething', 13),
     ('samething', 25),
     ('samething', 27)]