Search code examples
pythonsetgraph-theorygraph-traversal

Finding loops between numbers in a list of sets


I marked my answer as the answer because it is the one that is doing what I was wanting and anyone wanting to do the same thing should start there. But I would love to see a better answer (order of magnitude for the generation of all loops for example D shown) in Python and will happily select a better answer. The original OP follows.

Given a list of sets like:

sets=[{1,2},{2,3},{1,3}]

the product (1,2,3) will be generated twice in itertools.product(*sets), as the literals (1,2,3) and (2,3,1), because there is a loop. If there is no loop there will be no duplication, even though there might be lots of commonality between sets.

A loop is formed to A in a set when you travel to B in the same set and then to B in another set that has A or to B in another set with C which connects to a set with A. e.g. 1>2--2>3--3>1 where '--' indicates movement between sets and '>' indicates movement within the set. The smallest loop would involve a pair of numbers in common between two sets, e.g. a>b--b>a. {edit: @ravenspoint's notation is nice, I suggest using {a}-b-{a} instead of the above.} Loops in canonical form should not have a bridging value used more than once: either this represents a case where the loop traced back on itself (like in a cursive "i") or there is a smaller loops that could be made (like the outer and inner squares on the Arch de Triumph](https://commons.wikimedia.org/wiki/File:Paris_Arc_de_Triomphe.jpg).

What type of graph structure could I be using to represent this? I have tried representing each set as a node and then indicating which sets are connected to which, but this is not right since for [{1,2},{1,3},{1,4}], there is a connection between all sets -- the common 1-- but there is no loop. I have also tried assigning a letter to each number in each set, but that doesn't seem right, either, since then I don't know how to discriminate against loops within a set.

This was motivated by this question about generating unique products.

Sample sets like the following (which has the trivial loop 4>17--17>4 and longer loops like 13>5--5>11--11>13)

[{1, 13, 5}, {11, 13}, {17, 11, 4, 5}, {17, 4, 1}]

can be generated as shown in the docstring of core.

Alternate visualization analogy

Another way to visualize the "path/loop" is to think of coloring points on a grid: columns contain elements of the sets and equal elements are in the same row. A loop is a path that starts at one point and ends at the same point by travelling vertically or horizontally from point to point and must include both directions of motion. A suitable permutation of rows and columns would reveal a staircase polygon.


Solution

  • In my comment to this answer I indicate a way to use edges to try to see if there is a loop. For some sets, however, this can take a long time. For example, the presence of a loop in the following is not detected until nearly 470k products of edges have been tested (about 6 seconds).

    D = [{1, 2, 3, 4, 5, 6, 7, 8}, {4, 11, 13, 14, 15, 16, 17}, {11, 20, 21, 23, 24, 28, 29}, {32, 34, 3, 37, 29, 30, 31}, {32, 39, 40, 41, 44, 7, 15, 20}, {6, 39, 47, 48, 16, 52, 53, 24}, {5, 47, 58, 61, 31}, {34, 8, 15, 52, 21, 58, 62, 63}, {37, 41, 14, 78, 23, 63}, {1, 40, 48, 82, 17, 23}]
    

    Alternatively, one can just generate products and keep sorted results and see if a duplicate is ever detected...maybe you will be lucky and find a duplicate right away, but maybe you won't. For example, these sets show a duplicate result after about the 24k product from product(*C) (where each result is put in a set as tuple(sorted(i)):

    C = [{1, 2, 3, 4, 5, 6}, {9, 12, 13, 14, 15, 16, 17, 18}, {2, 21, 22, 23, 24, 25, 27}, {35, 14, 21, 28, 30, 31}, {1, 35, 37, 38, 43, 12, 44, 25}, {44, 45, 15, 52, 53, 30}, {5, 45, 13, 22, 57, 58}, {37, 6, 18, 23, 59, 62}, {3, 16, 53, 59, 63}, {38, 9, 73, 24, 58, 28}]
    

    How to think of the graph structure

    I think this situation can be represented as a graph as follows:

    1. for each set identify the possible pairs
    2. the vertices are the set of possible pairs across all sets
    3. the edges are the connections between vertices that are not in the same set

    For example, for the sets {1,2,4},{2,3},{1,3} the pairs are

    [{(1,2),(1,4),(2,4)},{(2,3)},{(1,3)}]
    

    So the vertices are

    V = {
    (1,2):-1, (1,4):-2, (2,4):-3,
    (2,3):-4,
    (1,3):-5
    }
    

    The edges are

    E = [(-1,-4),(-1,-5),(-2,-5),(-3,-4),(-4,-5)]
    

    And now you can analyze this graph however you like.

    But it is not necessary to generate all the edges to find out if there is a loop. It might not even be a good idea to generate this graph structure. Consider the case for D.

    These are the edges

    [(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (0, 7), (0, 8), (1, 16), (1, 17), (1, 4), (1, 11), (1, 13), (1, 14), (1, 15), (2, 20), (2, 21), (2, 23), (2, 24), (2, 11), (2, 28), (2, 29), (3, 32), (3, 34), (3, 3), (3, 37), (3, 29), (3, 30), (3, 31), (4, 32), (4, 7), (4, 40), (4, 41), (4, 39), (4, 44), (4, 15), (4, 20), (5, 6), (5, 39), (5, 47), (5, 48), (5, 16), (5, 52), (5, 53), (5, 24), (6, 5), (6, 58), (6, 31), (6, 61), (6, 47), (7, 34), (7, 8), (7, 15), (7, 52), (7, 21), (7, 58), (7, 62), (7, 63), (8, 23), (8, 37), (8, 78), (8, 41), (8, 14), (8, 63), (9, 48), (9, 1), (9, 17), (9, 82), (9, 23), (9, 40)]
    

    If we number them like this

    {(0, 1): 0, (0, 2): 1, (0, 3): 2, (0, 4): 3, (0, 5): 4, (0, 6): 5, (0, 7): 6, (0, 8): 7, (1, 4): 8, (1, 11): 9, (1, 13): 10, (1, 14): 11, (1, 15): 12, (1, 16): 13, (1, 17): 14, (2, 11): 15, (2, 20): 16, (2, 21): 17, (2, 23): 18, (2, 24): 19, (2, 28): 20, (2, 29): 21, (3, 3): 22, (3, 29): 23, (3, 30): 24, (3, 31): 25, (3, 32): 26, (3, 34): 27, (3, 37): 28, (4, 7): 29, (4, 15): 30, (4, 20): 31, (4, 32): 32, (4, 39): 33, (4, 40): 34, (4, 41): 35, (4, 44): 36, (5, 6): 37, (5, 16): 38, (5, 24): 39, (5, 39): 40, (5, 47): 41, (5, 48): 42, (5, 52): 43, (5, 53): 44, (6, 5): 45, (6, 31): 46, (6, 47): 47, (6, 58): 48, (6, 61): 49, (7, 8): 50, (7, 15): 51, (7, 21): 52, (7, 34): 53, (7, 52): 54, (7, 58): 55, (7, 62): 56, (7, 63): 57, (8, 14): 58, (8, 23): 59, (8, 37): 60, (8, 41): 61, (8, 63): 62, (8, 78): 63, (9, 1): 64, (9, 17): 65, (9, 23): 66, (9, 40): 67, (9, 48): 68, (9, 82): 69}
    

    We can identify the graph in which we are seeking loops

    {0: [1, 2, 3, 4, 5, 6, 7, 64], 1: [0, 2, 3, 4, 5, 6, 7], 2: [0, 1, 3, 4, 5, 6, 7, 22], 3: [0, 1, 2, 4, 5, 6, 7, 8], 4: [0, 1, 2, 3, 5, 6, 7, 45], 5: [0, 1, 2, 3, 4, 6, 7, 37], 6: [0, 1, 2, 3, 4, 5, 7, 29], 7: [0, 1, 2, 3, 4, 5, 6, 50], 13: [14, 8, 9, 10, 11, 12, 38], 14: [13, 8, 9, 10, 11, 12, 65], 8: [13, 14, 9, 10, 11, 12, 3], 9: [13, 14, 8, 10, 11, 12, 15], 10: [13, 14, 8, 9, 11, 12], 11: [13, 14, 8, 9, 10, 12, 58], 12: [13, 14, 8, 9, 10, 11, 30, 51], 16: [17, 18, 19, 15, 20, 21, 31], 17: [16, 18, 19, 15, 20, 21, 52], 18: [16, 17, 19, 15, 20, 21, 59, 66], 19: [16, 17, 18, 15, 20, 21, 39], 15: [16, 17, 18, 19, 20, 21, 9], 20: [16, 17, 18, 19, 15, 21], 21: [16, 17, 18, 19, 15, 20, 23], 26: [27, 22, 28, 23, 24, 25, 32], 27: [26, 22, 28, 23, 24, 25, 53], 22: [26, 27, 28, 23, 24, 25, 2], 28: [26, 27, 22, 23, 24, 25, 60], 23: [26, 27, 22, 28, 24, 25, 21], 24: [26, 27, 22, 28, 23, 25], 25: [26, 27, 22, 28, 23, 24, 46], 32: [29, 34, 35, 33, 36, 30, 31, 26], 29: [32, 34, 35, 33, 36, 30, 31, 6], 34: [32, 29, 35, 33, 36, 30, 31, 67], 35: [32, 29, 34, 33, 36, 30, 31, 61], 33: [32, 29, 34, 35, 36, 30, 31, 40], 36: [32, 29, 34, 35, 33, 30, 31], 30: [32, 29, 34, 35, 33, 36, 31, 12, 51], 31: [32, 29, 34, 35, 33, 36, 30, 16], 37: [40, 41, 42, 38, 43, 44, 39, 5], 40: [37, 41, 42, 38, 43, 44, 39, 33], 41: [37, 40, 42, 38, 43, 44, 39, 47], 42: [37, 40, 41, 38, 43, 44, 39, 68], 38: [37, 40, 41, 42, 43, 44, 39, 13], 43: [37, 40, 41, 42, 38, 44, 39, 54], 44: [37, 40, 41, 42, 38, 43, 39], 39: [37, 40, 41, 42, 38, 43, 44, 19], 45: [48, 46, 49, 47, 4], 48: [45, 46, 49, 47, 55], 46: [45, 48, 49, 47, 25], 49: [45, 48, 46, 47], 47: [45, 48, 46, 49, 41], 53: [50, 51, 54, 52, 55, 56, 57, 27], 50: [53, 51, 54, 52, 55, 56, 57, 7], 51: [53, 50, 54, 52, 55, 56, 57, 12, 30], 54: [53, 50, 51, 52, 55, 56, 57, 43], 52: [53, 50, 51, 54, 55, 56, 57, 17], 55: [53, 50, 51, 54, 52, 56, 57, 48], 56: [53, 50, 51, 54, 52, 55, 57], 57: [53, 50, 51, 54, 52, 55, 56, 62], 59: [60, 63, 61, 58, 62, 18, 66], 60: [59, 63, 61, 58, 62, 28], 63: [59, 60, 61, 58, 62], 61: [59, 60, 63, 58, 62, 35], 58: [59, 60, 63, 61, 62, 11], 62: [59, 60, 63, 61, 58, 57], 68: [64, 65, 69, 66, 67, 42], 64: [68, 65, 69, 66, 67, 0], 65: [68, 64, 69, 66, 67, 14], 69: [68, 64, 65, 66, 67], 66: [68, 64, 65, 69, 67, 18, 59], 67: [68, 64, 65, 69, 66, 34]}
    

    Finding the loops in that seems like it will be difficult and much information about the original problem has been lost. e.g. we are not interested in loops within a set.

    If you are just trying to see if there is a loop, it is much easier to do something like this:

    How to detect a loop

    For each set, see if there is a connection to another set that links back to the starting set to complete a loop. No x or y can be used more than twice and each x will have two y values that were used. This is how I am currently doing this.

    from collections import defaultdict
    
    def subsets(sets):
        """remove supersets from sets and retain 1 of any duplicates
    
        Examples
        ========
    
        >>> subsets([[1], [1], [1], [2, 3], [1, 2, 3], [4]]
        [[1], [4], [2, 3]]
    
        """
        size = defaultdict(list)
        for es in enumerate(map(set, sets)):
            n = len(es[1])
            if not any(es[1]==v for _,v in size[n]):
                size[n].append(es)
        s = [size[i] for i in reversed(sorted(size))]
        keep = defaultdict(list)
        for i in range(len(s)):
            for k,A in s[i]:
                do = True
                for j in range(len(s)-1, i, -1):
                    for _,a in s[j]:
                        if A.issuperset(a):
                            do = False
                            break
                    if not do:
                        break
                if do:
                    keep[i].append(sets[k])
        return [j for i in reversed(keep) for j in keep[i]]
    
    
    def find_loops(points, small=False):
        """
        Finds all unique loops in a set of points, where a loop defined by (x, y)
        tuples at the vertices of a staircase polygon (which may be folded). The
        function returns loops in a canonical form as every other point in the
        loop starting from the smallest (x,y) tuple, travelling in the ccw
        direction; the representative points are then sorted and added as a tuple
        to the set of all loops that is returned.
    
        For best results, remove an element from any set of length 1 and
        remove an element from a set if it is only used once, e.g. [{1,4},{1,2,3},{2,3}]
        would become [{},{2,3},{2,3}]. Do this recursively. If the empty lists are
        left in place then the x values of the loops will correspond to the
        indices of the original sets.
    
        Args:
            points (list of sets or list of tuples):
                - If a list of sets, each set represents the y-values for a specific x-coordinate.
                  For example, [{1, 2}, {2, 3}] means x=0 has y-values {1, 2}, and x=1 has y-values {2, 3}.
                - If a list of tuples, each tuple represents an (x, y) coordinate.
            small (default is False)
                when True, only loops having no subset are returned
    
        Returns:
            set of tuples: A set of canonical loop representations. Each loop is represented as a tuple
            of sorted (x, y) pairs
    
        Examples
        ========
    
        >>> dat = [{8, 9}, {9, 3, 5}, {8, 9, 3, 5}, {8, 5}]  # or [(0,9), (0,9), (1,9), etc...]
        >>> for i in sorted(find_loops(dat),key=lambda x:(len(x),x)):
        ...     print(i)
        ...
        ((0, 8), (2, 9))
        ((1, 3), (2, 5))
        ((1, 3), (2, 9))
        ((1, 5), (2, 9))
        ((2, 5), (3, 8))
        ((0, 8), (1, 9), (2, 3))
        ((0, 8), (1, 9), (2, 5))
        ((0, 8), (1, 9), (3, 5))
        ((0, 8), (2, 9), (3, 5))
        ((1, 3), (2, 8), (3, 5))
        ((1, 5), (2, 9), (3, 8))
        ((0, 8), (1, 3), (2, 9), (3, 5))
        ((0, 8), (1, 9), (2, 3), (3, 5))
    
        Using flag `small=True` will eliminate supersets from the loops:
    
        >>> for i in sorted(find_loops(dat, small=True),key=lambda x:(len(x),x)):
        ...     print(i)
        ...
        ((0, 8), (2, 9))
        ((1, 3), (2, 5))
        ((1, 3), (2, 9))
        ((1, 5), (2, 9))
        ((2, 5), (3, 8))
        ((0, 8), (1, 9), (2, 3))
        ((0, 8), (1, 9), (2, 5))
        ((0, 8), (1, 9), (3, 5))
        ((1, 3), (2, 8), (3, 5))
    
        Note that the subsets that are found will depend on the order of the
        sets. Both c and d below have 3 loops. But with the re-ordering in d,
        one of them will be a subset of another loop.
    
        >>> c = [{8, 7}, {9, 8, 7}, {9, 7}]
        >>> for i in find_loops(c): print (i)
        ...
        ((0, 8), (1, 7))
        ((0, 8), (1, 9), (2, 7))
        ((1, 7), (2, 9))
    
        >>> d = [{8, 7}, {9, 7}, {9, 8, 7}]  # second loop is subset of first
        ...
        ((0, 8), (1, 7), (2, 9))
        ((1, 7), (2, 9))
        ((0, 8), (2, 7))
    
        >>> find_loops([{3, 4}, {16, 4}, {24, 20}, {32, 3}, {32, 20}, {16, 24}])
        {{((0, 3), (1, 4), (2, 24), (3, 32), (4, 20), (5, 16))}
        >>> find_loops([{3, 4}])
        set()
        """
        loops = set()
    
        def Loop(l):
            if m := l.index(min(l)):
                # smallest (x,y) is at position m instead of 0
                if l[m-1][1] == l[m][1]:
                    # reverse the loop and update m
                    l = l[::-1]
                    m = -m - 1  # if it was at 1 it is not at -2
                l = l[m:] + l[:m]
    
            # put in canonical form
            L = l[::2]
            L.sort()
            L = tuple(L)
            # store if not already seen
            if L not in loops:
                loops.add(L)
    
        # Standardize input
        assert type(points[0]) in (list, set, tuple), '(x,y) are points {a,b} or [a,b] are sets
        if type(points[0]) is not tuple:
            points = [(x, y) for x, s in enumerate(points) for y in s]
    
        # build maps of where each x or y is located
        xmap = defaultdict(set)
        ymap = defaultdict(set)
        for x,y in points:
            xmap[x].add(y)
            ymap[y].add(x)
    
        # Recursive function to extend the path
        def extend_path(path, x0):
            y1 = path[-1][1]
    
            # Look for the next pair of points
            same_y = [
                (xi, y1) for xi in ymap[y1]
                if xi not in {p[0] for p in path}
            ]
            for x, y in same_y:
                same_x = [
                    (x, yi) for yi in xmap[x]
                    if yi not in {p[1] for p in path}
                ]
                for xi, yi in same_x:
                    # check for closure
                    if yi in xmap[x0]:
                        Loop(path + [(x, y), (xi, yi), (x0, yi)])
                    # extend
                    new_path = path + [(x, y), (xi, yi)]
                    extend_path(new_path, x0)
    
        # choose the first 3 points to form a right and up direction (ccw turn)
        # to limit how many path are pursued
        for x1, y1 in points:
            # Step 1: Find points with x2 > x1 and y2 = y1
            candidates_x2_y2 = [(x2, y2) for x2, y2 in points if x2 > x1 and y2 == y1]
            for x2, y2 in candidates_x2_y2:
                # Step 2: Find points with x3 = x2 and y3 > y2
                candidates_x3_y3 = [(x3, y3) for x3, y3 in points if x3 == x2 and y3 > y2]
                for x3, y3 in candidates_x3_y3:
                    # Step 3: Start path extension
                    path = [(x1, y1), (x2, y2), (x3, y3)]
                    if (x1, y3) in points:
                        # check for closure
                        Loop(path + [(x1, y3)])
                    # extend
                    extend_path(path, x1)
    
        return subsets(list(loops)) if small else loops
    

    For a point of reference consider these results:

    >>> D
    [{1, 2, 3, 4, 5, 6, 7, 8},
     {4, 11, 13, 14, 15, 16, 17},
     {11, 20, 21, 23, 24, 28, 29},
     {32, 34, 3, 37, 29, 30, 31},
     {32, 39, 40, 41, 7, 44, 15, 20},
     {6, 39, 47, 48, 16, 52, 53, 24},
     {5, 47, 58, 61, 31},
     {34, 8, 15, 52, 21, 58, 62, 63},
     {37, 41, 14, 78, 23, 63},
     {1, 40, 48, 17, 82, 23}]
    >>> len(find_loops(D,small=1))  # 12 seconds
    10440
    >>> len(find_loops(D))  # 3 seconds
    22261
    

    (For the indended use, these times are not significant.)