Search code examples
pythonthreadpoolexecutor

How to use map() of threadPoolExecutor in case of nested for loop in python


I have 2 Dictionaries,

data1 = {
  "key": [
    {
      "id": "key1",
      "name": "key1"
    },
    {
      "id": "key2",
      "name": "key2"
    },
    {
      "id": "key3",
      "name": "key3"
    },
  ]
}

data2 = {
  "key": [
    {
      "id": "key2"
      "name": "TEST key2"
    },
    {
      "id": "key1",
      "name": "TEST key1"
    },
  ]
}

I am making a list of tuples of the objects inside the key list in data1 and data2 having matching id, using the below code


common_keys = [
    (each_data1_key, each_data2_key)
    for each_data1_key in data1.get("key", [])
    for each_data2_key in data2.get("key", [])
    if each_data1_key.get("id") == each_data2_key.get("id")
]

# Example result = [({"id":"key1", "name": "key1"}, {"id": "key1", "name": "TEST key1"}), ...]

Now I want to use these tuples for processing further in threadPoolExecutor's map function. Currently, I am using below code,

def func(object1, object2):
   """
   func is being run in the thread to do some task parallelly with object1 and object2
   """
   <SOME CODE HERE> ...

def myfunc(common_keys):
    if common_keys:
        with ThreadPoolExecutor(max_workers=10) as executor:
            executor.map(lambda x: func(*x), common_keys)

# func is a function that accepts 2 objects as parameters
# since we are sending tuple of the object in threads, in order to process some task

My task is to optimize the code by reducing the loop(I have used nested for loop to find common_keys list`

Could anyone help me to find any solution in which, in order to get a list of tuples of the objects having the same id, I don't need to use a nested loop(or, with another optimized way)?


Solution

  • Building on https://stackoverflow.com/a/18554039/9981846, if you have some memory to spare, you can make your ids dictionary keys, to benefit from the fast set-like operations later.

    # Loop once for each list
    dict1 = {item["id"]: item for item in data1.get("key", [])}
    dict2 = {item["id"]: item for item in data2.get("key", [])}
    
    # Set intersection is fast
    common_keys = [(dict1[key], dict2[key])
                   for key
                   in dict1.keys() & dict2.keys()]
    

    Also, if you pass the dictionaries to myfunc, instead of common_keys you can spare the creation of that list using a generator.

    def func(object1, object2):
        print(f"Got 1: {object1}, and 2: {object2}")
    
    
    def generate_pairs(d1, d2):
        for key in d1.keys() & d2.keys():
            yield d1[key], d2[key]
    
    
    def myfunc(d1, d2):
        if common_keys:
            with ThreadPoolExecutor(max_workers=10) as executor:
                executor.map(lambda x: func(*x), generate_pairs(d1, d2))
    
    
    myfunc(dict1, dict2)
    >>> Got object1: {'id': 'key2', 'name': 'key2'}, object2: {'id': 'key2', 'name': 'TEST key2'}
    >>> Got object1: {'id': 'key1', 'name': 'key1'}, object2: {'id': 'key1', 'name': 'TEST key1'}
    

    Finally, to keep speed and spare memory vs above, you could create only the smallest one of the two dictionaries, passing the "key" lists to the generator:

    def generate_pairs(l1, l2):
        little, big = (l1, l2) if (len(l1) < len(l2)) else (l2, l1)
        d1 = {item["id"]: item for item in little}
    
        # loop once over the second list
        for key_data_2 in big:
            key_data_1 = d1.get(key_data_2["id"], None)  # Average case fast too
            if key_data_1 is not None:
                yield key_data_1, key_data_2
    
    
    # with the same `myfunc` except for parameters types
    def myfunc(l1, l2):
        if common_keys:
            with ThreadPoolExecutor(max_workers=10) as executor:
                executor.map(lambda x: func(*x), generate_pairs(l1, l2))
    
    
    # and you'd call 
    myfunc(data1.get("key", []), data2.get("key", []))