Search code examples
pythonpython-multiprocessing

Python defaultdict(lambda: None) without lambda


I have code using a ProcessPoolExecutor which can't pickle lambdas and functions. Some of the code that I want to execute in parallel uses a defaultdict with a default value of None.

How would you proceed? If at all possible, I would not like to touch the parallelizing code.

What I have:

class SomeClass:
    def __init__(self):
        self.some_dict = defaultdict(lambda: None)

    def generate(self):
        <some code>

def some_method_to_parallelize(x: SomeClass):
    <some code>

def some_method():
    max_workers = round(os.cpu_count() // 1.5)
    invocations_per_process = 100
    with ProcessPoolExecutor(max_workers=max_workers) as executor:    
        data = [executor.submit(some_method_to_parallelize, SomeClass())] for _ in range(invocations_per_process)]
        data = list(itertools.chain.from_iterable([r.result() for r in data]))
    

Solution

  • Try:

    collections.defaultdict(type(None))
    

    That gets you a reference to NoneType for use as your defaultdict's default factory. When constructed, it produces None, and unlike a lambda, appears to be picklable.