Search code examples
pythonmultithreadinglocking

Python multiprocessing with multiple locks slower than single lock


I am making experiments with multiprocessing in Python. I wrote some code that requires concurrent modification of 3 different variables (a dict, a float and an int), shared across the different process. My understanding of the works behind locking tells me that if I have 3 different shared variables, it will be more efficient to assign a lock to each one. After all, why should process 2 wait to modify variable A just because process 1 is modifying variable B? It makes sense to me that if you need to lock variable B, then A should still be accessible to other processes. I run the 2 toy examples below, based on a real program I'm writing, and to my surprise the code runs faster with a single lock!

Single lock: 2.1 seconds

import multiprocessing as mp
import numpy as np
import time

class ToyClass:
    def __init__(self, shared_a, shared_b):
        self.a = shared_a
        self.b = shared_b

    def update_a(self, key, n, lock):
        with lock:
            if key not in self.a:
                self.a[key] = np.zeros(4)
            self.a[key][n] += 1

    def update_b(self, lock):
        with lock:
            self.b.value = max(0.1, self.b.value - 0.01)

def run_episode(toy, counter, lock):
    key = np.random.randint(100)
    n = np.random.randint(4)
    toy.update_a(key, n, lock)
    toy.update_b(lock)
    with lock:
        counter.value += 1

if __name__ == "__main__":
    num_episodes = 1000
    num_processes = 4

    t0 = time.time()

    with mp.Manager() as manager:
        shared_a = manager.dict()
        shared_b = manager.Value('d', 0)
        counter = manager.Value('i', 0)

        toy = ToyClass(shared_a=shared_a, shared_b=shared_b)

        # Single lock
        lock = manager.Lock()

        pool = mp.Pool(processes=num_processes)

        for _ in range(num_episodes):
            pool.apply_async(run_episode, args=(toy, counter, lock))

        pool.close()
        pool.join()

    tf = time.time()

    print(f"Time to compute single lock: {tf - t0} seconds")

Multiple locks: 2.85 seconds!!

import multiprocessing as mp
import numpy as np
import time


class ToyClass:  ## Same definition as for single lock
    def __init__(self, shared_a, shared_b):
        self.a = shared_a
        self.b = shared_b

    def update_a(self, key, n, lock):
        with lock:
            if key not in self.a:
                self.a[key] = np.zeros(4)
            self.a[key][n] += 1 

    def update_b(self, lock):
        with lock:
            self.b.value = max(0.1, self.b.value - 0.01)

def run_episode(toy, counter, lock_a, lock_b, lock_count):
    key = np.random.randint(100)
    n = np.random.randint(4)
    toy.update_a(key, n, lock_a)
    toy.update_b(lock_b)
    with lock_count:
        counter.value += 1

if __name__ == "__main__":
    num_episodes = 1000
    num_processes = 4

    t0 = time.time()

    with mp.Manager() as manager:
        shared_a = manager.dict()
        shared_b = manager.Value('d', 0)
        counter = manager.Value('i', 0)

        toy = ToyClass(shared_a=shared_a, shared_b=shared_b)

        # 3 locks for 3 shared variables
        lock_a = manager.Lock()
        lock_b = manager.Lock()
        lock_count = manager.Lock()

        pool = mp.Pool(processes=num_processes)

        for _ in range(num_episodes):
            pool.apply_async(run_episode, args=(toy, counter, lock_a, lock_b, lock_count))

        pool.close()
        pool.join()

    tf = time.time()

    print(f"Time to compute multi-lock: {tf - t0} seconds")

What am I missing here? Is there a computational overhead when switching between locks that outweighs any potential benefit? These are just flags, how can it be?

Note: I know the code runs much faster when single process/thread, but this is part of an experiment precisely to understand the downsides of multiprocessing.


Solution

  • This has nothing to do with the locking, you are just sending 3 locks per call instead of 1, which is 3 times the transmission overhead.

    to verify this you can test

    1. keep sending the 3 locks but only use 1 of them, you will get the same time as using the 3 locks
    2. change 2 of the locks to be simple Manager.Value objects, still the same time as 3 locks.

    the locking part plays no role in this, you are just sending the locks over and over, which you can avoid by using an initializer when spawning the pool.

    lock_a = None
    lock_b = None
    lock_counter = None
    def initialize_locks(val1,val2,val3):
        global lock_a, lock_b, lock_counter
        lock_a = val1
        lock_b = val2
        lock_counter = val3
    
    ...
    
    pool = mp.Pool(processes=num_processes, initializer=initialize_locks, initargs=(lock_a, lock_b, lock_counter,))
    

    Also if you are using the initializer you should instead use multiprocessing.Lock instead, as it is faster than Manager.Lock, same applied to Multiprocessing.Value instead of Manager.Value