Search code examples
clinuxprocess-management

Whats the correct value to base the maximum number of CPU's to sched_setaffinity to?


I have some confusion as to whats the correct value to use for the number of CPU's I can use to make a CPU_SET for a sched_setaffinity call on my system.

My /proc/cpuinfo file:

processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 37
model name  : Intel(R) Core(TM) i5 CPU       M 460  @ 2.53GHz
stepping    : 5
microcode   : 0x2
cpu MHz     : 1199.000
cache size  : 3072 KB
physical id : 0
siblings    : 4
core id     : 0
cpu cores   : 2
apicid      : 0
initial apicid  : 0
fdiv_bug    : no
f00f_bug    : no
coma_bug    : no
fpu     : yes
fpu_exception   : yes
cpuid level : 11
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx rdtscp lm constant_tsc arch_perfmon pebs bts xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt lahf_lm ida arat dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips    : 5056.34
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model       : 37
model name  : Intel(R) Core(TM) i5 CPU       M 460  @ 2.53GHz
stepping    : 5
microcode   : 0x2
cpu MHz     : 1199.000
cache size  : 3072 KB
physical id : 0
siblings    : 4
core id     : 0
cpu cores   : 2
apicid      : 1
initial apicid  : 1
fdiv_bug    : no
f00f_bug    : no
coma_bug    : no
fpu     : yes
fpu_exception   : yes
cpuid level : 11
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx rdtscp lm constant_tsc arch_perfmon pebs bts xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt lahf_lm ida arat dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips    : 5056.34
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor   : 2
vendor_id   : GenuineIntel
cpu family  : 6
model       : 37
model name  : Intel(R) Core(TM) i5 CPU       M 460  @ 2.53GHz
stepping    : 5
microcode   : 0x2
cpu MHz     : 1199.000
cache size  : 3072 KB
physical id : 0
siblings    : 4
core id     : 2
cpu cores   : 2
apicid      : 4
initial apicid  : 4
fdiv_bug    : no
f00f_bug    : no
coma_bug    : no
fpu     : yes
fpu_exception   : yes
cpuid level : 11
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx rdtscp lm constant_tsc arch_perfmon pebs bts xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt lahf_lm ida arat dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips    : 5056.34
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor   : 3
vendor_id   : GenuineIntel
cpu family  : 6
model       : 37
model name  : Intel(R) Core(TM) i5 CPU       M 460  @ 2.53GHz
stepping    : 5
microcode   : 0x2
cpu MHz     : 1199.000
cache size  : 3072 KB
physical id : 0
siblings    : 4
core id     : 2
cpu cores   : 2
apicid      : 5
initial apicid  : 5
fdiv_bug    : no
f00f_bug    : no
coma_bug    : no
fpu     : yes
fpu_exception   : yes
cpuid level : 11
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx rdtscp lm constant_tsc arch_perfmon pebs bts xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt lahf_lm ida arat dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips    : 5056.34
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

In this file I found there are processor lines numbered 0-3, for "physical" processors (4 processors total). I can get this value from sysconf(_SC_NPROCESSORS_ONLN) but, there is also a line for cpu cores and each processor has 2. I believe this represents the "logical" processors or hyperthreading that is accounted for. Should I be using only the "physical" value or can I use the "logical" count?

I'm not clear on this because if I go to /proc/PID/status theres the line Cpus_allowed_list and that can range from 0-7 (8 processors total) but, I also wrote a script to call taskset -c -p PID for every "PID" running and this shows every process of having an affinity list of 0-3 max.


Solution

  • For hyper-threading you get 2 logical CPUs per core. This means that if one logical CPU stalls for any reason (cache miss, branch misprediction, instruction dependencies, etc) the core can execute instructions from the other logical CPU and isn't sitting there waiting/being wasted. In addition, typically the core is capable of doing more in parallel than a single logical CPU uses, so even without any of the (frequently common) stalls you still get benefits (by increasing utilisation of the core's resources). In this case; you want to use all logical CPUs.

    For badly written multi-threaded software (software with significant scalability problems) the gains from hyper-threading can be lost by poor scalability. For example, the process might cause "cache line bouncing" (where cache lines are frequently being "bounced" between cores) and using affinity to reduce the number of cores can help. For another example, a core's RAM bandwidth might be the bottleneck (causing the process to get no benefit from hyper-threading), and using affinity to preventing the process from using both logical CPUs in each core can improve performance. For these cases; you only want to use some logical CPUs (but don't know which ones).

    For single-threaded processes, it's not going to matter what you do.

    Basically (assuming multi-threaded); the best setting for your process depends on the process; therefore you should run some tests to see how affinity effects your process.

    Misc. Notes

    When hyper-threading was first introduced (Netburst/Pentium 4) it was "less than ideal", and the schedulers in most operating systems weren't optimised to efficiently schedule load for hyper-threading (which made it even worse). This led to a lot of people thinking that hyper-threading is bad in lots of cases. Modern Intel CPUs do not have the same problems that Netburst/Pentium 4 had, and modern operating system schedulers do have optimisations for hyper-threading. This means that the old assumptions ("hyper-threading is probably bad") that were correct back then are mostly obsolete and wrong now.