Search code examples
pythonmultiprocessingpool

Python multiprocessing using pool.map with list


I am working on python code using multiprocessing. Below is the code

import multiprocessing
import os

def square(n):
    #logger.info("Worker process id for {0}: {1}".format(n, os.getpid()))
    logger.info("Evaluating square of the number {0}".format(n))
    print('process id of {0}: {1}'.format(n,os.getpid()))
    return (n * n)

if __name__ == "__main__":
    # input list
    mylist = [1, 2, 3, 4, 5,6,7,8,9,10]

    # creating a pool object
    p = multiprocessing.Pool(4)

    # map list to target function
    result = p.map(square, mylist)

    print(result)

The number of CPU cores in my server is 4. If I use 4 only single processes is initiated. In general, it should start 4 separate processes right?.

If I set the value to 8 in the Pool object below is the response I got

process id of 1: 25872

process id of 2: 8132

process id of 3: 1672

process id of 4: 27000

process id of 6: 25872

process id of 5: 20964

process id of 9: 25872

process id of 8: 1672

process id of 7: 8132

process id of 10: 27000

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

This started 5 separate processes(25872,8132,1672,27000,20964) even though there are only 4 cpu cores.

  1. I don't understand why the pool initiated only 1 process when the value is 4 and initiated 5 separate processes when the value is 8.

  2. Can pool object be instantiated with a value greater than the number of CPU cores?

  3. Also what should be the optimal value we should use while instantiating pool object if a list contains a million records?

I have been through official python documentation, but I couldn't find info. Please help


Solution

  • Let's answer one by one.

    1. I don't understand why the pool initiated only 1 process when the value is 4 and initiated 5 separate processes when the value is 8.

    The pool initiated 4 processes. Do not mistake the number of cores you have for the number of processes, is totally independent. You have 5 processes because the initial python one also counts. So, you started with the main python processes, which call the pool to start another 4 ones, that makes 5 of them. In the case that you see that only a few of the processes are being used, it means that probably they are capable of killing the task fast enough so the other processes are not needed.

    1. Can pool object be instantiated with a value greater than the number of CPU cores?

    Yes indeed, you can instantiate any number you want (although there may be some kind of limit depending on the OS). But notice that this will just make your CPU to be overloaded. More explanation below.

    1. Also what should be the optimal value we should use while instantiating pool object if a list contains a million records?

    Well, usually the "optimal" would be that all the cores of your CPU are fully in usage by your pool. So, if you have 4 cores, 4 processes would be the best option, although sometimes this is not exactly like that it is a good starting approximation.

    One last note,

    I have been through official python documentation, but I couldn't find info.

    This is not really python specific, it is general behavior in CS.