Search code examples
pythonlinuxparallel-processingpython-multiprocessing

Multiprocessing: use only the physical cores?


I have a function foo which consumes a lot of memory and which I would like to run several instances of in parallel.

Suppose I have a CPU with 4 physical cores, each with two logical cores.

My system has enough memory to accommodate 4 instances of foo in parallel but not 8. Moreover, since 4 of these 8 cores are logical ones anyway, I also do not expect using all 8 cores will provide much gains above and beyond using the 4 physical ones only.

So I want to run foo on the 4 physical cores only. In other words, I would like to ensure that doing multiprocessing.Pool(4) (4 being the maximum number of concurrent run of the function I can accommodate on this machine due to memory limitations) dispatches the job to the four physical cores (and not, for example, to a combo of two physical cores and their two logical offsprings).

How to do that in python?

Edit:

I earlier used a code example from multiprocessing but I am library agnostic ,so to avoid confusion, I removed that.


Solution

  • I found a solution that doesn't involve changing the source code of a python module. It uses the approach suggested here. One can check that only the physical cores are active after running that script by doing:

    lscpu
    

    in the bash returns:

    CPU(s):                8
    On-line CPU(s) list:   0,2,4,6
    Off-line CPU(s) list:  1,3,5,7
    Thread(s) per core:    1
    

    [One can run the script linked above from within python]. In any case, after running the script above, typing these commands in python:

    import multiprocessing
    multiprocessing.cpu_count()
    

    returns 4.