Search code examples
pythonmultiprocessinglockingipcmutex

How are locks differenciated in multiprocessing?


Let's say that you have two lists created with manager.list(), and two locks created with manager.Lock(). How do you assign each lock to each list? I was doing like

lock1 = manager.Lock()
lock2 = manager.Lock()
list1 = manager.list()
list2 = manager.list()

and when I wanted to write/read from the list

lock1.acquire()
list1.pop(0)
lock1.release()

lock2.acquire()
list2.pop(0)
lock2.released()

Today I realized that there's nothing that associates lock1 to list1. Am I misunderstanding these functions?


Solution

  • TL;DR yes, and this might be an XY problem!

    If you create a multiprocessing.Manager() and use its methods to create container primitives (.list and .dict), they will already be synchronized and you don't need to deal with the synchronization primitives yourself

    from multiprocessing import Manager, Process, freeze_support
    
    def my_function(d, lst):
        lst.append([x**2 for x in d.values()])
    
    def main():
        with Manager() as manager:  # context-managed SyncManager
            normal_dict = {'a': 1, 'b': 2}
            managed_synchronized_dict = manager.dict(normal_dict)
            managed_synchronized_list = manager.list()  # used to store results
            p = Process(
                target=my_function,
                args=(managed_synchronized_dict, managed_synchronized_list)
            )
            p.start()
            p.join()
            print(managed_synchronized_list)
    
    if __name__ == '__main__':
        freeze_support()
        main()
    
    % python3 ./test_so_66603485.py
    [[1, 4]]
    

    multiprocessing.Array, is also synchronized

    BEWARE: proxy objects are not directly comparable to their Python collection equivalents

    Note: The proxy types in multiprocessing do nothing to support comparisons by value. So, for instance, we have:

    >>> manager.list([1,2,3]) == [1,2,3]
    False
    

    One should just use a copy of the referent instead when making comparisons.


    Some confusion might come from the section of the multiprocessing docs on Synchronization Primitives, which implies that one should use a Manager to create synchronization primitives, when really the Manager can already do the synchronization for you

    Synchronization primitives

    Generally synchronization primitives are not as necessary in a multiprocess program as they are in a multithreaded program. See the documentation for threading module.

    Note that one can also create synchronization primitives by using a manager object – see Managers.

    If you use simply multiprocessing.Manager(), per the docs, it

    Returns a started SyncManager object which can be used for sharing objects between processes. The returned manager object corresponds to a spawned child process and has methods which will create shared objects and return corresponding proxies.

    From the SyncManager section

    Its methods create and return Proxy Objects for a number of commonly used data types to be synchronized across processes. This notably includes shared lists and dictionaries.

    This means that you probably have most of what you want already

    • manager object with methods for building managed types
    • synchronization via Proxy Objects

    Finally, to sum up the thread from comments specifically about instances of Lock objects

    • there's no inherent way to tell that some named lock is for anything in particular other than meta-information such as its name, comment(s), documentation ..conversely, they're free to be used for whatever synchronization needs you may have
    • some useful class/container can be made to manage both the lock and whatever it should be synchronizing -- a normal multiprocessing.Manager (SyncManager)'s .list and .dict do this, and a variety of other useful constructs exist, such as Pipe and Queue
    • one lock can be used to synchronize any number of actions, but having more locks can be a valuable trade-off as they are potentially unnecessarily blocking access to resources
    • a variety of synchronization primitives also exist for different purposes
    value = my_queue.get()  # already synchronized
    
    if not my_lock1.acquire(timeout=5):  # False if cannot acquire
        raise CustomException("failed to acquire my_lock1 after 5 seconds")
    try:
        with my_lock2():    # blocks until acquired
            some_shared_mutable = some_container_of_mutables[-1]
            some_shared_mutable = some_object_with_mutables.get()
            if foo(value, some_shared_mutable):
                action1(value, some_shared_mutable)
                action2(value, some_other_shared_mutable)
    finally:
        lock2.release()