Why can ProcessPoolExecutor use single process?

I have the same python code that is executed on my Windows 10 machine and a Linux server.

Relevant part looks like this:

import asyncio
from concurrent.futures import ProcessPoolExecutor
....

def Calculate(arg1, arg2):
    ...
    logging.info(<stuff>)

async def main():
    with ProcessPoolExecutor() as executor:
        for _ in range(10):
            executor.submit(Calculate, 1, 2)
    logging.info(<stuff>) 

if __name__ == '__main__':
    asyncio.run(main())

Now, on my Linux machine, the code will run even without if name == 'main': line, because it doesn't use child processes! I know because logging.info shows the same thread for all Calcualte() calls as well as the main code that does with ProcessPoolExecutor...

However, the calculation is still sped up considerably!

On my Windows 10 machine, the code will not run without if name == 'main': line, because it starts separate processes, each of which initializes the py module that contains the code, and that executes asyncio.run again from the child processes, which throws.

I thought maybe it thinks there is no need for separate processes, but the Linux machine has 10 cores and python sees this (os.cpu_cores()). And in the debugger, I see "subprocesses" appear in the call stack on Linux!

What could be going on?
Why is it using multiple processes on Windows 10 but not on Linux?
Why is it faster on Linux if it's using a single process?
Why do I see subprocesses in call stack on Linux if it's running all code in the same (main) thread?

Solution

Now, on my Linux machine, the code will run even without if name == 'main': line, because it doesn't use child processes! I know because logging.info shows the same thread for all Calcualte() calls as well as the main code that does with ProcessPoolExecutor...

Thread information is not an indicative that things are in the same process - the snippet above will create subprocesses and execute tasks there, on Linux nonetheless.

The difference you are seeing is due to that in Linux, contrasting to Windows and Mac Os, the creation of a subprocess will defaut to use the .fork call under the hood: this will make your code in the new process just keep running from the point it was forked. In other OSes, the .spawn method is used, which requires that the __main__ module be re-imported (this second import comes with a different content in the __name__ variable, and that is why the `if "name" == "main": " pattern is needed: otherwise the subprocess can't distinguish itself from the root processes and will act as if it is so).

And also, the .fork() call will clone a lot of metadata on your threads - use os.getpid() to get your process ID and verify you are actually running from different proceses.

Let's see if this get it covered so far:

What could be going on? Nothing out of the ordinary - just different behaviors for different OSes, but still multiprocesses.

Why is it using multiple processes on Windows 10 but not on Linux? It does run multiple processes in Linux as well. Just the way to start each sub-process is different and with less the overhead.

Why is it faster on Linux if it's using a single process? You don't mention it being "faster" than on Windows - but if it is, it might be because the subprocess startup and inter process comnication is more efficient. Overall, both systems should be roughly equivalent.

Why do I see subprocesses in call stack on Linux if it's running all code in the same (main) thread? it is not. It is really using sub-processes. Only in each sub-process the main thread happens to be named "main". (Your snippet does not show which thread metadata you are printing. Their names will certainly coincide. The ids should differ, but might coincide as well. os.getpid() will get you the real information anyway.