Search code examples
pythonpython-2.7parallel-processingpython-multiprocessing

Two functions in parallel with multiple arguments and return values


I've got two separate functions. Each of them takes quite a long time to execute.

def function1(arg):
     do_some_stuff_here
     return result1

def function2(arg1, arg2, arg3):
     do_some_stuff_here
     return result2

I'd like to launch them in parallel, get their results (knowing which is which) and process the results afterwards. For what I've understood, multiprocessing is more efficient than Threading in Python 2.7 (GIL related issue). However I'm a bit lost whether it is better to use Process, Pool or Queue and how to implement them in a correct pythonic way for my use case.

Any help appreciated ;)


Solution

  • First of all, Process, Pool and Queue all have different use case.

    Process is used to spawn a process by creating the Process object.

    from multiprocessing import Process
    
    def method1():
        print "in method1"
        print "in method1"
    
    def method2():
        print "in method2"
        print "in method2"
    
    p1 = Process(target=method1) # create a process object p1
    p1.start()                   # starts the process p1
    p2 = Process(target=method2)
    p2.start()
    

    Pool is used to parallelize execution of function across multiple input values.

    from multiprocessing import Pool
    
    def method1(x):
        print x
        print x**2
        return x**2
    
    p = Pool(3)
    result = p.map(method1, [1,4,9]) 
    print result          # prints [1, 16, 81]
    

    Queue is used to communicate between processes.

    from multiprocessing import Process, Queue
    
    def method1(x, l1):
        print "in method1"
        print "in method1"
        l1.put(x**2)
        return x
    
    def method2(x, l2):
        print "in method2"
        print "in method2"
        l2.put(x**3)
        return x
    
    l1 = Queue()
    p1 = Process(target=method1, args=(4, l1, ))  
    l2 = Queue()
    p2 = Process(target=method2, args=(2, l2, )) 
    p1.start()   
    p2.start()      
    print l1.get()          # prints 16
    print l2.get()          # prints 8
    

    Now, for your case you can use Process & Queue(3rd method) or you can manipulate the pool method to work (below)

    import itertools
    from multiprocessing import Pool
    import sys
    
    def method1(x):         
        print x
        print x**2
        return x**2
    
    def method2(x):        
        print x
        print x**3
        return x**3
    
    def unzip_func(a, b):  
        return a, b    
    
    def distributor(option_args):
        option, args = unzip_func(*option_args)    # unzip option and args 
    
        attr_name = "method" + str(option)            
        # creating attr_name depending on option argument
    
        value = getattr(sys.modules[__name__], attr_name)(args) 
        # call the function with name 'attr_name' with argument args
    
        return value
    
    
    option_list = [1,2]      # for selecting the method number
    args_list = [4,2]        
    # list of arg for the corresponding method, (argument 4 is for method1)
    
    p = Pool(3)              # creating pool of 3 processes
    
    result = p.map(distributor, itertools.izip(option_list, args_list)) 
    # calling the distributor function with args zipped as (option1, arg1), (option2, arg2) by itertools package
    print result             # prints [16,8]
    

    Hope this helps.