Search code examples
pythonsetgraph-theorygraph-traversal

Finding loops between numbers in a list of sets


Given a list of sets like:

sets=[{1,2},{2,3},{1,3}]

the product (1,2,3) will be generated twice in itertools.product(*sets), as the literals (1,2,3) and (2,3,1), because there is a loop. If there is no loop there will be no duplication, even though there might be lots of commonality between sets.

A loop is formed to A in a set when you travel to B in the same set and then to B in another set that has A or to B in another set with C which connects to a set with A. e.g. 1>2--2>3--3>1 where '--' indicates movement between sets and '>' indicates movement within the set. The smallest loop would involve a pair of numbers in common between two sets, e.g. a>b--b>a. {edit: @ravenspoint's notation is nice, I suggest using {a}-b-{a} instead of the above.} Loops in canonical form should not have a bridging value used: either this represents a case where the loop traced back on itself (like in a cursive "i") or there is a smaller loops that could be made (like the outer and inner squares on the Arch de Triumph](https://commons.wikimedia.org/wiki/File:Paris_Arc_de_Triomphe.jpg).

What type of graph structure could I be using to represent this? I have tried representing each set as a node and then indicating which sets are connected to which, but this is not right since for [{1,2},{1,3},{1,4}], there is a connection between all sets -- the common 1-- but there is no loop. I have also tried assigning a letter to each number in each set, but that doesn't seem right, either, since then I don't know how to discriminate against loops within a set.

This was motivated by this question about generating unique products.

Sample sets like the following (which has the trivial loop 4>17--17>4 and longer loops like 13>5--5>11--11>13)

[{1, 13, 5}, {11, 13}, {17, 11, 4, 5}, {17, 4, 1}]

can be generated as shown in the docstring of core.

Alternate visualization analogy

Another way to visualize the "path/loop" is to think of coloring points on a grid: columns contain elements of the sets and equal elements are in the same row. A loop is a path that starts at one point and ends at the same point by travelling vertically or horizontally from point to point and must include both directions of motion. A suitable permutation of rows and columns would reveal a staircase polygon.


Solution

  • To form a loop you must travel from one number ( A ) in a set to another number ( B ) in the set, then to the same number ( B ) in another set

    So, we cannot just connect pairs of sets that share one or more numbers. We must connect a pair of sets that share one or more numbers with one or more edges that are labelled with the number that is used to connect them. Now we insist that when we travel through a node, we cannot use an out edge with the same label as the in edge that was used.

    Here is the code to implement this graph generator https://codeberg.org/JamesBremner/so79339486/src/commit/f53b941cd6561f01c81af1194ff5922289174157/src/main.cpp#L32-L50

    void genGraph()
    {
        raven::set::cRunWatch aWatcher("genGraph");
        theGD.g.clear();
        for (int is1 = 0; is1 < theSets.size(); is1++)
        {
            for (int is2 = is1 + 1; is2 < theSets.size(); is2++)
            {
                for (int n1 : theSets[is1])
                    for (int n2 : theSets[is2])
                        if (n1 == n2)
                            theGD.setEdgeWeight(
                                theGD.g.add(
                                    "set" + std::to_string(is1),
                                    "set" + std::to_string(is2)),
                                n1);
            }
        }
    }
    

    for the sample D = [{1, 2, 3, 4, 5, 6, 7, 8}, ...

    the graph looks like:

    enter image description here

    The graph generation is done in less than a millisecond

    raven::set::cRunWatch code timing profile
    Calls           Mean (secs)     Total           Scope
           1        0.000856        0.000856        genGraph
    

    The algorithm to find cycles ( == loops ) in this problem is a modified depth first search. Two modifications are required.

    1. The BFS cannot travel along two successive edges with the same label

    2. When a previously visited vertex is reached, the Dijsktra algorithm is applied to find the path that forms the shortest cycle that leads back to the back edge of the previously visited vertex.

    Implementing this sort of thing in Python is not recommended - the performance is painfully slow.

    Here is some C++ code to implement the modified DFS to prevent using edges with the same label twice in a row.

    https://codeberg.org/JamesBremner/so79339486/src/branch/main/src/cycleFinder.cpp

    The output, when run on example D, is

    24 loops found
    set0 -1- set9 -0- set2 -0- set7 -8- set0
    set0 -1- set9 -0- set2 -0- set3 -31- set6 -5- set0
    set0 -1- set9 -0- set2 -0- set5 -6- set0
    set7 -0- set2 -0- set5 -52- set7
    set9 -0- set2 -0- set5 -48- set9
    set0 -1- set9 -0- set2 -0- set4 -7- set0
    set7 -0- set2 -0- set4 -15- set7
    set8 -0- set9 -0- set2 -0- set4 -41- set8
    set9 -0- set2 -0- set4 -40- set9
    set0 -1- set9 -0- set2 -0- set3 -3- set0
    set7 -0- set2 -0- set3 -34- set7
    set8 -0- set9 -0- set2 -0- set3 -37- set8
    set8 -0- set9 -0- set2 -23- set8
    set0 -1- set9 -0- set2 -11- set1 -4- set0
    set4 -0- set2 -0- set9 -1- set0 -4- set1 -15- set4
    set5 -0- set2 -0- set9 -1- set0 -4- set1 -16- set5
    set7 -0- set2 -0- set9 -1- set0 -4- set1 -15- set7
    set8 -0- set9 -1- set0 -4- set1 -14- set8
    set9 -1- set0 -4- set1 -17- set9
    set4 -0- set2 -0- set3 -32- set4
    set4 -0- set2 -0- set5 -39- set4
    set6 -5- set0 -1- set9 -0- set2 -0- set5 -47- set6
    set6 -5- set0 -1- set9 -0- set2 -0- set7 -58- set6
    set8 -0- set9 -0- set2 -0- set7 -63- set8
    raven::set::cRunWatch code timing profile
    Calls           Mean (secs)     Total           Scope
           1        0.0088616       0.0088616       cycleFinder
           1        0.0009284       0.0009284       genGraph
    

    -N- means that the loop has taken N as the common number to travel between sets.

    The cycle finder takes 10 milliseconds on this problem. That seems plenty fast enough, but a simple optimization might be well worthwhile if you have problems larger than this. I believe that the presence of one loop means that the number sets fail. If so, then we could abort the loop finding as soon as one loop is found.


    Another small test problem {1, 2, 4}, {2, 3}, {1, 3, 5}, {4, 5}

    enter image description here

    2 loops found
    set0 -2- set1 -3- set2 -1- set0
    set3 -4- set0 -1- set2 -5- set3
    raven::set::cRunWatch code timing profile
    Calls           Mean (secs)     Total           Scope
           1        0.0007017       0.0007017       cycleFinder
           1        0.0003599       0.0003599       genGraph