python linux parallel-processing python-multiprocessing

Multiprocessing: use only the physical cores?

I have a function foo which consumes a lot of memory and which I would like to run several instances of in parallel.

Suppose I have a CPU with 4 physical cores, each with two logical cores.

My system has enough memory to accommodate 4 instances of foo in parallel but not 8. Moreover, since 4 of these 8 cores are logical ones anyway, I also do not expect using all 8 cores will provide much gains above and beyond using the 4 physical ones only.

So I want to run foo on the 4 physical cores only. In other words, I would like to ensure that doing multiprocessing.Pool(4) (4 being the maximum number of concurrent run of the function I can accommodate on this machine due to memory limitations) dispatches the job to the four physical cores (and not, for example, to a combo of two physical cores and their two logical offsprings).

How to do that in python?

Edit:

I earlier used a code example from multiprocessing but I am library agnostic ,so to avoid confusion, I removed that.

Solution

I found a solution that doesn't involve changing the source code of a python module. It uses the approach suggested here. One can check that only the physical cores are active after running that script by doing:

lscpu

in the bash returns:

CPU(s):                8
On-line CPU(s) list:   0,2,4,6
Off-line CPU(s) list:  1,3,5,7
Thread(s) per core:    1

[One can run the script linked above from within python]. In any case, after running the script above, typing these commands in python:

import multiprocessing
multiprocessing.cpu_count()

returns 4.