Search code examples
windowsmultithreadingprocesscontext-switch

Are threads from multiple processes actually running at the same time


In a Windows operating system with 2 physical x86/amd64 processors (P0 + P1), running 2 processes (A + B), each with two threads (T0 + T1), is it possible (or even common) to see the following:

P0:A:T0 running at the same time as P1:B:T0

then, after 1 (or is that 2?) context switch(es?)

P0:B:T1 running at the same time as P1:A:T1

In a nutshell, I'd like to know if - on a multiple processor machine - the operating system is free to schedule any thread from any process at any time, regardless of what other threads from other processes are already running.

EDIT: To clarify the silly example, imagine that process A's thread A:T0 has affinity to processor P0 (and A:T1 to P1,) while process B's thread B:T0 has affinity to processor P1 (and B:T1 to to P0). It probably doesn't matter whether these processors are cores or sockets.

Is there a first-class concept of a process context switch? Perfmon shows context switches under the Thread object, but nothing under the Process object.


Solution

  • Yes, it is possible and it happens pretty often.
    The OS tries to not switch one thread between CPUs (you can make it try harder setting the threads preferred processor, or you can even lock it to single processor via affinity).
    Windows' process is not an execution unit by itself - from this viewpoint, its basically just a context for its threads.

    EDIT (further clarifications)

    There's nothing like a "process context switch". Basically, the OS scheduler assigns the threads via a (very adaptive) round-robin algorithm to any free processor/core (as the affinity allows), if the "previous" processor isn't immediately available, regardless of the processes (which means multi-threaded processes can steal much more CPU power).

    This "jumping" may seem expensive, considering at least the L1 (and sometimes L2) caches are per-core (apart from different slot/package processors), but it's still cheaper than delays caused by waiting to the "right" processor and inability to do elaborate load-balancing (which the "jumping" scheme makes possible).
    This may not apply to the NUMA architecture, but there are much more considerations invoved (e.g. adapting all memory-allocations to be thread- and processor-bound and avoiding as much state/memory sharing as possible).

    As for affinity: you can set affinity masks per-thread or per-process (which supersedes all process' threads' settings), but the OS enforces least one logical processor affiliated per thread (you never end up with a zero mask).

    A process' default affinity mask is inherited from its parent process (which allows you to create single-core loaders for problematic legacy executables), and threads inherit the mask from the process they belong to.

    You may not set a threads affinity to a processor outside the process' affinity, but you can further limit it.

    Any thread by default, will jump between the available logical processors (especially if it yields, calls to kernel, etc), may jump even if it has its preferred processor set, but only if it has to, but it will NOT jump to a processor outside its affinity mask (which may lead to considerable delays).

    I'm not sure if the scheduler sees any difference between physical and hyper-threaded processors, but even if it doesn't (which I assume), the consequences are in most cases not of a concern, i.e. there should not be much difference between multiple threads sharing physical or logical processors if the thread count is just the same. Regardless, there are some reports of cache-thrashing in this scenario, mainly in high-performance heavily multithreaded applications, like SQL server or .NET and Java VMs, which may or may not benefit from HyperThreading turned off.