Search code examples
pythonpython-3.xconcurrencymultiprocessingpython-multiprocessing

What is the right way to share a read only configuration to multiple processes?


I have a python application which is going to create a process for each element of the given inputs colleftion. The inputs is a collection of about 8 elements. And the application reads a topic to get 8 elements periodically.

For each element of the input, I create a new process and pass in the input to a function.

The function is CPU bound in nature, it performs numerical operations.

My application has a Configuration object which is a dictionary. I load the data in the configuration at the time of loading the main process and then create a pool with 8 worker subpools.

What is the right mechanism to pass the configuration object in each of the process? I don't want to increase the memory footprint of the process.

As an example:

# cpu intensive operation
def cpu_bound(input):
    ...  # complex cpu bound op
    # I want to use config here

    return output


def get_config():
    # create configuration object
    config = {
        "version": 1,
        "disable_existing_loggers": False,
        "loggers": {
            "": {
                "level": "INFO"
            }, 
            "another.module": {
                "level": "DEBUG"
            }
        }
    }


def pool_handler(inputs):
    p = Pool(8)  # 8 core machine
    results = p.map(cpu_bound, inputs)
    return results


if __name__ == "__main__":

    config = get_config()
    # get inputs from a topic
    inputs = get_inputs()
    results = pool_handler(inputs)

Question What is the recommended approach to use the configuration within each process? The configuration is read-only in nature as I only need to load it once at boot up of the application. There are multiple ways but what is the recommended approach for this scenario?


Solution

  • The correct way to share static information within multiprocessing.Pool consist in using the initializer function to set it via its initargs.

    The two above variables are in fact passed to the Pool workers as Process constructor parameters thus following the recommendations of the multiprocessing programming guidelines.

    Explicitly pass resources to child processes

    On Unix using the fork start method, a child process can make use of a shared resource created in a parent process using a global resource. However, it is better to pass the object as an argument to the constructor for the child process.

    variable = None
    
    
    def initializer(*initargs):
        """The initializer function is executed on each worker process
        once they start.
    
        """
        global variable
    
        variable = initargs
    
    
    def function(*args):
        """The function is executed on each parameter of `map`."""
        print(variable)
    
    
    with multiprocessing.Pool(initializer=initializer, initargs=[1, 2, 3]) as pool:
        pool.map(function, (1, 2, 3))