python multiprocessing fastapi event-loop class-variables

Why isn't my class attribute preserved when using multiprocessing?

I have the following class in a FastAPI application:

import asyncio
import logging
from multiprocessing import Lock, Process

from .production_status import Job as ProductionStatusJob


class JobScheduler:
    loop = None
    logger = logging.getLogger("job_scheduler")
    process_lock = Lock()
    JOBS = [ProductionStatusJob]

    @classmethod
    def start(cls) -> None:
        cls.logger.info("Starting Up (1/2)")
        Process(target=cls._loop).start()
    
    @classmethod
    def _loop(cls) -> None:
        cls.loop = asyncio.get_event_loop()
        cls.loop.create_task(cls._run())
        cls.logger.info("Startup Complete (2/2)")
        cls.loop.run_forever()
        cls.loop.close()

    @classmethod
    async def _run(cls) -> None:
        while True:
            ...

    @classmethod
    async def stop(cls) -> None:
        cls.logger.info("Shutting Down (1/2)")
        with cls.process_lock:
            cls.loop.stop()                          # <= This Line
            cls.loop.close()
        cls.logger.info("Shutdown Complete (2/2)")
        cls.loop = None

On the startup and shutdown events of the FastAPI application, the JobScheduler.start() and JobScheduler.stop() methods will be called.

The start method works smoothly, but in stop I get an error:

File "/backend/app/main.py", line 146, in stop_job_scheduler
2023-08-16 11:46:27     await job_scheduler.stop()
2023-08-16 11:46:27   File "/backend/app/jobs/__init__.py", line 59, in stop
2023-08-16 11:46:27     cls.loop.stop()
2023-08-16 11:46:27 AttributeError: 'NoneType' object has no attribute 'stop'

But cls.loop is set during the _loop method (which is executed at the end of start) - so why does cls.loop still have its initial None value when the stop method is called?

Are there any better approaches to clean up the background processes when the FastAPI application calls shutdown?

Solution

multiprocessing in Python is funny. It's more powerful than multithreading but also comes with some caveats. The first of those is that you're actually running a different Python interpreter entirely. That means that global variables and the like are going to get a new copy for each process you run.

Depending on your operating system and choice of start method, your processes may be forked or spawned. A spawned process will start anew, as though a new Python program was just spun up. A forked process will get all of the current values of variables from the source process, but it'll still copy all of those variables. Future changes to either process will not affect the other, without explicit synchronization using one of the multiprocessing helpers.

You can use a Manager to synchronize data between processes explicitly. This acts sort of like a local server that both processes connect to. For more explicitly pub-sub data, you can also use a Queue to pass information from one process to another.