Search code examples
pythonlistdictionarypattern-matchingpython-itertools

How to merge multiple dictionaries from separate lists if they share any key-value pairs?


How to combine dictionaries from multiple lists if they share a common key-value pair?

For example, here are three lists of dictionaries:

l1 = [{'fruit':'banana','category':'B'},{'fruit':'apple','category':'A'}]
l2 = [{'type':'new','category':'A'},{'type':'old','category':'B'}]
l3 = [{'order':'2','type':'old'},{'order':'1','type':'new'}]

Desired Result:

l = [{'fruit':'apple','category':'A','order':'1','type':'new'},{'fruit':'banana','category':'B','order':'2','type':'old'}]

The tricky part is that I want this function to only take in the lists as arguments and not the keys, because I want to only have to plug in any number of list of dictionaries and not be concerned by which key-names are the overlapping ones (in this case they key-names that bring all three together are 'category' and 'type').

I should note index shouldn't matter, as it only should be based on common elements.

Here's my attempt:

def combine_lists(*args):
    base_list = args[0]
    L = []
    for sublist in args[1:]:
        L.extend(sublist)
    for D in base_list:
        for Dict in L:
            if any([tup in Dict.items() for tup in D.items()]): 
                D.update(Dict)
    return base_list

Solution

  • For this problem it is convenient to regard the dicts as lists of tuples:

    In [4]: {'fruit':'apple','category':'A'}.items()
    Out[4]: [('category', 'A'), ('fruit', 'apple')]
    

    Since we wish to connect dicts which share a key-value pair, we can regard each tuple as a node in a graph and pairs of tuples as edges. Once you have a graph the problem is reduced to finding the graph's connected components.

    Using networkx,

    import itertools as IT
    import networkx as nx
    
    l1 = [{'fruit':'apple','category':'A'},{'fruit':'banana','category':'B'}]
    l2 = [{'type':'new','category':'A'},{'type':'old','category':'B'}]
    l3 = [{'order':'1','type':'new'},{'order':'2','type':'old'}]
    
    data = [l1, l2, l3]
    G = nx.Graph()
    for dct in IT.chain.from_iterable(data):
        items = list(dct.items())
        node1 = node1[0]
        for node2 in items:
            G.add_edge(node1, node22)
    
    for cc in nx.connected_component_subgraphs(G):
        print(dict(IT.chain.from_iterable(cc.edges())))
    

    yields

    {'category': 'A', 'fruit': 'apple', 'type': 'new', 'order': '1'}
    {'category': 'B', 'fruit': 'banana', 'type': 'old', 'order': '2'}
    

    If you wish to remove the networkx dependency, you could use, for example, pillmuncher's implementation:

    import itertools as IT
    
    def connected_components(neighbors):
        """
        https://stackoverflow.com/a/13837045/190597 (pillmuncher)
        """
        seen = set()
        def component(node):
            nodes = set([node])
            while nodes:
                node = nodes.pop()
                seen.add(node)
                nodes |= neighbors[node] - seen
                yield node
        for node in neighbors:
            if node not in seen:
                yield component(node)
    
    l1 = [{'fruit':'apple','category':'A'},{'fruit':'banana','category':'B'}]
    l2 = [{'type':'new','category':'A'},{'type':'old','category':'B'}]
    l3 = [{'order':'1','type':'new'},{'order':'2','type':'old'}]
    
    data = [l1, l2, l3]
    G = {}
    for dct in IT.chain.from_iterable(data):
        items = dct.items()
        node1 = items[0]
        for node2 in items[1:]:
            G.setdefault(node1, set()).add(node2)
            G.setdefault(node2, set()).add(node1)
    
    for cc in connected_components(G):
        print(dict(cc))
    

    which prints the same result as above.