Search code examples
pythonlistmergeboolean-expressionconnected-components

Merge lists that share common elements


My input is a list of lists. Some of them share common elements, eg.

L = [['a','b','c'],['b','d','e'],['k'],['o','p'],['e','f'],['p','a'],['d','g']]

I need to merge all lists, that share a common element, and repeat this procedure as long as there are no more lists with the same item. I thought about using boolean operations and a while loop, but couldn't come up with a good solution.

The final result should be:

L = [['a','b','c','d','e','f','g','o','p'],['k']] 

Solution

  • You can see your list as a notation for a Graph, ie ['a','b','c'] is a graph with 3 nodes connected to each other. The problem you are trying to solve is finding connected components in this graph.

    You can use NetworkX for this, which has the advantage that it's pretty much guaranteed to be correct:

    l = [['a','b','c'],['b','d','e'],['k'],['o','p'],['e','f'],['p','a'],['d','g']]
    
    import networkx 
    from networkx.algorithms.components.connected import connected_components
    
    
    def to_graph(l):
        G = networkx.Graph()
        for part in l:
            # each sublist is a bunch of nodes
            G.add_nodes_from(part)
            # it also imlies a number of edges:
            G.add_edges_from(to_edges(part))
        return G
    
    def to_edges(l):
        """ 
            treat `l` as a Graph and returns it's edges 
            to_edges(['a','b','c','d']) -> [(a,b), (b,c),(c,d)]
        """
        it = iter(l)
        last = next(it)
    
        for current in it:
            yield last, current
            last = current    
    
    G = to_graph(l)
    print connected_components(G)
    # prints [['a', 'c', 'b', 'e', 'd', 'g', 'f', 'o', 'p'], ['k']]
    

    To solve this efficiently yourself you have to convert the list into something graph-ish anyways, so you might as well use networkX from the start.