I am working on python code using multiprocessing. Below is the code
import multiprocessing
import os
def square(n):
#logger.info("Worker process id for {0}: {1}".format(n, os.getpid()))
logger.info("Evaluating square of the number {0}".format(n))
print('process id of {0}: {1}'.format(n,os.getpid()))
return (n * n)
if __name__ == "__main__":
# input list
mylist = [1, 2, 3, 4, 5,6,7,8,9,10]
# creating a pool object
p = multiprocessing.Pool(4)
# map list to target function
result = p.map(square, mylist)
print(result)
The number of CPU cores in my server is 4. If I use 4 only single processes is initiated. In general, it should start 4 separate processes right?.
If I set the value to 8 in the Pool object below is the response I got
process id of 1: 25872
process id of 2: 8132
process id of 3: 1672
process id of 4: 27000
process id of 6: 25872
process id of 5: 20964
process id of 9: 25872
process id of 8: 1672
process id of 7: 8132
process id of 10: 27000
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
This started 5 separate processes(25872,8132,1672,27000,20964) even though there are only 4 cpu cores.
I don't understand why the pool initiated only 1 process when the value is 4 and initiated 5 separate processes when the value is 8.
Can pool object be instantiated with a value greater than the number of CPU cores?
Also what should be the optimal value we should use while instantiating pool object if a list contains a million records?
I have been through official python documentation, but I couldn't find info. Please help
Let's answer one by one.
- I don't understand why the pool initiated only 1 process when the value is 4 and initiated 5 separate processes when the value is 8.
The pool initiated 4 processes. Do not mistake the number of cores you have for the number of processes, is totally independent. You have 5 processes because the initial python one also counts. So, you started with the main python processes, which call the pool to start another 4 ones, that makes 5 of them. In the case that you see that only a few of the processes are being used, it means that probably they are capable of killing the task fast enough so the other processes are not needed.
- Can pool object be instantiated with a value greater than the number of CPU cores?
Yes indeed, you can instantiate any number you want (although there may be some kind of limit depending on the OS). But notice that this will just make your CPU to be overloaded. More explanation below.
- Also what should be the optimal value we should use while instantiating pool object if a list contains a million records?
Well, usually the "optimal" would be that all the cores of your CPU are fully in usage by your pool. So, if you have 4 cores, 4 processes would be the best option, although sometimes this is not exactly like that it is a good starting approximation.
One last note,
I have been through official python documentation, but I couldn't find info.
This is not really python specific, it is general behavior in CS.