Given a list of sets like:
sets=[{1,2},{2,3},{1,3}]
the product (1,2,3)
will be generated twice in itertools.product(*sets)
, as the literals (1,2,3)
and (2,3,1)
, because there is a loop. If there is no loop there will be no duplication, even though there might be lots of commonality between sets.
A loop is formed to A in a set when you travel to B in the same set and then to B in another set that has A or to B in another set with C which connects to a set with A. e.g. 1>2--2>3--3>1
where '--' indicates movement between sets and '>' indicates movement within the set. The smallest loop would involve a pair of numbers in common between two sets, e.g. a>b--b>a
. {edit: @ravenspoint's notation is nice, I suggest using {a}-b-{a}
instead of the above.} Loops in canonical form should not have a bridging value used more than once: either this represents a case where the loop traced back on itself (like in a cursive "i") or there is a smaller loops that could be made (like the outer and inner squares on the Arch de Triumph](https://commons.wikimedia.org/wiki/File:Paris_Arc_de_Triomphe.jpg).
What type of graph structure could I be using to represent this? I have tried representing each set as a node and then indicating which sets are connected to which, but this is not right since for [{1,2},{1,3},{1,4}]
, there is a connection between all sets -- the common 1-- but there is no loop. I have also tried assigning a letter to each number in each set, but that doesn't seem right, either, since then I don't know how to discriminate against loops within a set.
This was motivated by this question about generating unique products.
Sample sets like the following (which has the trivial loop 4>17--17>4
and longer loops like 13>5--5>11--11>13
)
[{1, 13, 5}, {11, 13}, {17, 11, 4, 5}, {17, 4, 1}]
can be generated as shown in the docstring of core.
Another way to visualize the "path/loop" is to think of coloring points on a grid: columns contain elements of the sets and equal elements are in the same row. A loop is a path that starts at one point and ends at the same point by travelling vertically or horizontally from point to point and must include both directions of motion. A suitable permutation of rows and columns would reveal a staircase polygon.
In my comment to this answer I indicate a way to use edges to try to see if there is a loop. For some sets, however, this can take a long time. For example, the presence of a loop in the following is not detected until nearly 470k products of edges have been tested (about 6 seconds).
D = [{1, 2, 3, 4, 5, 6, 7, 8}, {4, 11, 13, 14, 15, 16, 17}, {11, 20, 21, 23, 24, 28, 29}, {32, 34, 3, 37, 29, 30, 31}, {32, 39, 40, 41, 44, 7, 15, 20}, {6, 39, 47, 48, 16, 52, 53, 24}, {5, 47, 58, 61, 31}, {34, 8, 15, 52, 21, 58, 62, 63}, {37, 41, 14, 78, 23, 63}, {1, 40, 48, 82, 17, 23}]
Alternatively, one can just generate products and keep sorted results and see if a duplicate is ever detected...maybe you will be lucky and find a duplicate right away, but maybe you won't. For example, these sets show a duplicate result after about the 24k product from product(*C)
(where each result is put in a set as tuple(sorted(i))
:
C = [{1, 2, 3, 4, 5, 6}, {9, 12, 13, 14, 15, 16, 17, 18}, {2, 21, 22, 23, 24, 25, 27}, {35, 14, 21, 28, 30, 31}, {1, 35, 37, 38, 43, 12, 44, 25}, {44, 45, 15, 52, 53, 30}, {5, 45, 13, 22, 57, 58}, {37, 6, 18, 23, 59, 62}, {3, 16, 53, 59, 63}, {38, 9, 73, 24, 58, 28}]
I think this situation can be represented as a graph as follows:
For example, for the sets {1,2,4},{2,3},{1,3} the pairs are
[{(1,2),(1,4),(2,4)},{(2,3)},{(1,3)}]
So the vertices are
V = {
(1,2):-1, (1,4):-2, (2,4):-3,
(2,3):-4,
(1,3):-5
}
The edges are
E = [(-1,-4),(-1,-5),(-2,-5),(-3,-4),(-4,-5)]
And now you can analyze this graph however you like.
But it is not necessary to generate all the edges to find out if there is a loop. It might not even be a good idea to generate this graph structure. Consider the case for D.
These are the edges
[(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (0, 7), (0, 8), (1, 16), (1, 17), (1, 4), (1, 11), (1, 13), (1, 14), (1, 15), (2, 20), (2, 21), (2, 23), (2, 24), (2, 11), (2, 28), (2, 29), (3, 32), (3, 34), (3, 3), (3, 37), (3, 29), (3, 30), (3, 31), (4, 32), (4, 7), (4, 40), (4, 41), (4, 39), (4, 44), (4, 15), (4, 20), (5, 6), (5, 39), (5, 47), (5, 48), (5, 16), (5, 52), (5, 53), (5, 24), (6, 5), (6, 58), (6, 31), (6, 61), (6, 47), (7, 34), (7, 8), (7, 15), (7, 52), (7, 21), (7, 58), (7, 62), (7, 63), (8, 23), (8, 37), (8, 78), (8, 41), (8, 14), (8, 63), (9, 48), (9, 1), (9, 17), (9, 82), (9, 23), (9, 40)]
If we number them like this
{(0, 1): 0, (0, 2): 1, (0, 3): 2, (0, 4): 3, (0, 5): 4, (0, 6): 5, (0, 7): 6, (0, 8): 7, (1, 4): 8, (1, 11): 9, (1, 13): 10, (1, 14): 11, (1, 15): 12, (1, 16): 13, (1, 17): 14, (2, 11): 15, (2, 20): 16, (2, 21): 17, (2, 23): 18, (2, 24): 19, (2, 28): 20, (2, 29): 21, (3, 3): 22, (3, 29): 23, (3, 30): 24, (3, 31): 25, (3, 32): 26, (3, 34): 27, (3, 37): 28, (4, 7): 29, (4, 15): 30, (4, 20): 31, (4, 32): 32, (4, 39): 33, (4, 40): 34, (4, 41): 35, (4, 44): 36, (5, 6): 37, (5, 16): 38, (5, 24): 39, (5, 39): 40, (5, 47): 41, (5, 48): 42, (5, 52): 43, (5, 53): 44, (6, 5): 45, (6, 31): 46, (6, 47): 47, (6, 58): 48, (6, 61): 49, (7, 8): 50, (7, 15): 51, (7, 21): 52, (7, 34): 53, (7, 52): 54, (7, 58): 55, (7, 62): 56, (7, 63): 57, (8, 14): 58, (8, 23): 59, (8, 37): 60, (8, 41): 61, (8, 63): 62, (8, 78): 63, (9, 1): 64, (9, 17): 65, (9, 23): 66, (9, 40): 67, (9, 48): 68, (9, 82): 69}
We can identify the graph in which we are seeking loops
{0: [1, 2, 3, 4, 5, 6, 7, 64], 1: [0, 2, 3, 4, 5, 6, 7], 2: [0, 1, 3, 4, 5, 6, 7, 22], 3: [0, 1, 2, 4, 5, 6, 7, 8], 4: [0, 1, 2, 3, 5, 6, 7, 45], 5: [0, 1, 2, 3, 4, 6, 7, 37], 6: [0, 1, 2, 3, 4, 5, 7, 29], 7: [0, 1, 2, 3, 4, 5, 6, 50], 13: [14, 8, 9, 10, 11, 12, 38], 14: [13, 8, 9, 10, 11, 12, 65], 8: [13, 14, 9, 10, 11, 12, 3], 9: [13, 14, 8, 10, 11, 12, 15], 10: [13, 14, 8, 9, 11, 12], 11: [13, 14, 8, 9, 10, 12, 58], 12: [13, 14, 8, 9, 10, 11, 30, 51], 16: [17, 18, 19, 15, 20, 21, 31], 17: [16, 18, 19, 15, 20, 21, 52], 18: [16, 17, 19, 15, 20, 21, 59, 66], 19: [16, 17, 18, 15, 20, 21, 39], 15: [16, 17, 18, 19, 20, 21, 9], 20: [16, 17, 18, 19, 15, 21], 21: [16, 17, 18, 19, 15, 20, 23], 26: [27, 22, 28, 23, 24, 25, 32], 27: [26, 22, 28, 23, 24, 25, 53], 22: [26, 27, 28, 23, 24, 25, 2], 28: [26, 27, 22, 23, 24, 25, 60], 23: [26, 27, 22, 28, 24, 25, 21], 24: [26, 27, 22, 28, 23, 25], 25: [26, 27, 22, 28, 23, 24, 46], 32: [29, 34, 35, 33, 36, 30, 31, 26], 29: [32, 34, 35, 33, 36, 30, 31, 6], 34: [32, 29, 35, 33, 36, 30, 31, 67], 35: [32, 29, 34, 33, 36, 30, 31, 61], 33: [32, 29, 34, 35, 36, 30, 31, 40], 36: [32, 29, 34, 35, 33, 30, 31], 30: [32, 29, 34, 35, 33, 36, 31, 12, 51], 31: [32, 29, 34, 35, 33, 36, 30, 16], 37: [40, 41, 42, 38, 43, 44, 39, 5], 40: [37, 41, 42, 38, 43, 44, 39, 33], 41: [37, 40, 42, 38, 43, 44, 39, 47], 42: [37, 40, 41, 38, 43, 44, 39, 68], 38: [37, 40, 41, 42, 43, 44, 39, 13], 43: [37, 40, 41, 42, 38, 44, 39, 54], 44: [37, 40, 41, 42, 38, 43, 39], 39: [37, 40, 41, 42, 38, 43, 44, 19], 45: [48, 46, 49, 47, 4], 48: [45, 46, 49, 47, 55], 46: [45, 48, 49, 47, 25], 49: [45, 48, 46, 47], 47: [45, 48, 46, 49, 41], 53: [50, 51, 54, 52, 55, 56, 57, 27], 50: [53, 51, 54, 52, 55, 56, 57, 7], 51: [53, 50, 54, 52, 55, 56, 57, 12, 30], 54: [53, 50, 51, 52, 55, 56, 57, 43], 52: [53, 50, 51, 54, 55, 56, 57, 17], 55: [53, 50, 51, 54, 52, 56, 57, 48], 56: [53, 50, 51, 54, 52, 55, 57], 57: [53, 50, 51, 54, 52, 55, 56, 62], 59: [60, 63, 61, 58, 62, 18, 66], 60: [59, 63, 61, 58, 62, 28], 63: [59, 60, 61, 58, 62], 61: [59, 60, 63, 58, 62, 35], 58: [59, 60, 63, 61, 62, 11], 62: [59, 60, 63, 61, 58, 57], 68: [64, 65, 69, 66, 67, 42], 64: [68, 65, 69, 66, 67, 0], 65: [68, 64, 69, 66, 67, 14], 69: [68, 64, 65, 66, 67], 66: [68, 64, 65, 69, 67, 18, 59], 67: [68, 64, 65, 69, 66, 34]}
Finding the loops in that seems like it will be difficult and much information about the original problem has been lost. e.g. we are not interested in loops within a set.
If you are just trying to see if there is a loop, it is much easier to do something like this:
For each set, see if there is a connection to another set that links back to the starting set to complete a loop. No x or y can be used more than twice and each x will have two y values that were used. This is how I am currently doing this.
from collections import defaultdict
def subsets(sets):
"""remove supersets from sets and retain 1 of any duplicates
Examples
========
>>> subsets([[1], [1], [1], [2, 3], [1, 2, 3], [4]]
[[1], [4], [2, 3]]
"""
size = defaultdict(list)
for es in enumerate(map(set, sets)):
n = len(es[1])
if not any(es[1]==v for _,v in size[n]):
size[n].append(es)
s = [size[i] for i in reversed(sorted(size))]
keep = defaultdict(list)
for i in range(len(s)):
for k,A in s[i]:
do = True
for j in range(len(s)-1, i, -1):
for _,a in s[j]:
if A.issuperset(a):
do = False
break
if not do:
break
if do:
keep[i].append(sets[k])
return [j for i in reversed(keep) for j in keep[i]]
def find_loops(points, small=False):
"""
Finds all unique loops in a set of points, where a loop defined by (x, y)
tuples at the vertices of a staircase polygon (which may be folded). The
function returns loops in a canonical form as every other point in the
loop starting from the smallest (x,y) tuple, travelling in the ccw
direction; the representative points are then sorted and added as a tuple
to the set of all loops that is returned.
For best results, remove an element from any set of length 1 and
remove an element from a set if it is only used once, e.g. [{1,4},{1,2,3},{2,3}]
would become [{},{2,3},{2,3}]. Do this recursively. If the empty lists are
left in place then the x values of the loops will correspond to the
indices of the original sets.
Args:
points (list of sets or list of tuples):
- If a list of sets, each set represents the y-values for a specific x-coordinate.
For example, [{1, 2}, {2, 3}] means x=0 has y-values {1, 2}, and x=1 has y-values {2, 3}.
- If a list of tuples, each tuple represents an (x, y) coordinate.
small (default is False)
when True, only loops having no subset are returned
Returns:
set of tuples: A set of canonical loop representations. Each loop is represented as a tuple
of sorted (x, y) pairs
Examples
========
>>> dat = [{8, 9}, {9, 3, 5}, {8, 9, 3, 5}, {8, 5}] # or [(0,9), (0,9), (1,9), etc...]
>>> for i in sorted(find_loops(dat),key=lambda x:(len(x),x)):
... print(i)
...
((0, 8), (2, 9))
((1, 3), (2, 5))
((1, 3), (2, 9))
((1, 5), (2, 9))
((2, 5), (3, 8))
((0, 8), (1, 9), (2, 3))
((0, 8), (1, 9), (2, 5))
((0, 8), (1, 9), (3, 5))
((0, 8), (2, 9), (3, 5))
((1, 3), (2, 8), (3, 5))
((1, 5), (2, 9), (3, 8))
((0, 8), (1, 3), (2, 9), (3, 5))
((0, 8), (1, 9), (2, 3), (3, 5))
Using flag `small=True` will eliminate supersets from the loops:
>>> for i in sorted(find_loops(dat, small=True),key=lambda x:(len(x),x)):
... print(i)
...
((0, 8), (2, 9))
((1, 3), (2, 5))
((1, 3), (2, 9))
((1, 5), (2, 9))
((2, 5), (3, 8))
((0, 8), (1, 9), (2, 3))
((0, 8), (1, 9), (2, 5))
((0, 8), (1, 9), (3, 5))
((1, 3), (2, 8), (3, 5))
Note that the subsets that are found will depend on the order of the
sets. Both c and d below have 3 loops. But with the re-ordering in d,
one of them will be a subset of another loop.
>>> c = [{8, 7}, {9, 8, 7}, {9, 7}]
>>> for i in find_loops(c): print (i)
...
((0, 8), (1, 7))
((0, 8), (1, 9), (2, 7))
((1, 7), (2, 9))
>>> d = [{8, 7}, {9, 7}, {9, 8, 7}] # second loop is subset of first
...
((0, 8), (1, 7), (2, 9))
((1, 7), (2, 9))
((0, 8), (2, 7))
>>> find_loops([{3, 4}, {16, 4}, {24, 20}, {32, 3}, {32, 20}, {16, 24}])
{{((0, 3), (1, 4), (2, 24), (3, 32), (4, 20), (5, 16))}
>>> find_loops([{3, 4}])
set()
"""
loops = set()
def Loop(l):
if m := l.index(min(l)):
# smallest (x,y) is at position m instead of 0
if l[m-1][1] == l[m][1]:
# reverse the loop and update m
l = l[::-1]
m = -m - 1 # if it was at 1 it is not at -2
l = l[m:] + l[:m]
# put in canonical form
L = l[::2]
L.sort()
L = tuple(L)
# store if not already seen
if L not in loops:
loops.add(L)
# Standardize input
assert type(points[0]) in (list, set, tuple), '(x,y) are points {a,b} or [a,b] are sets
if type(points[0]) is not tuple:
points = [(x, y) for x, s in enumerate(points) for y in s]
# build maps of where each x or y is located
xmap = defaultdict(set)
ymap = defaultdict(set)
for x,y in points:
xmap[x].add(y)
ymap[y].add(x)
# Recursive function to extend the path
def extend_path(path, x0):
y1 = path[-1][1]
# Look for the next pair of points
same_y = [
(xi, y1) for xi in ymap[y1]
if xi not in {p[0] for p in path}
]
for x, y in same_y:
same_x = [
(x, yi) for yi in xmap[x]
if yi not in {p[1] for p in path}
]
for xi, yi in same_x:
# check for closure
if yi in xmap[x0]:
Loop(path + [(x, y), (xi, yi), (x0, yi)])
# extend
new_path = path + [(x, y), (xi, yi)]
extend_path(new_path, x0)
# choose the first 3 points to form a right and up direction (ccw turn)
# to limit how many path are pursued
for x1, y1 in points:
# Step 1: Find points with x2 > x1 and y2 = y1
candidates_x2_y2 = [(x2, y2) for x2, y2 in points if x2 > x1 and y2 == y1]
for x2, y2 in candidates_x2_y2:
# Step 2: Find points with x3 = x2 and y3 > y2
candidates_x3_y3 = [(x3, y3) for x3, y3 in points if x3 == x2 and y3 > y2]
for x3, y3 in candidates_x3_y3:
# Step 3: Start path extension
path = [(x1, y1), (x2, y2), (x3, y3)]
if (x1, y3) in points:
# check for closure
Loop(path + [(x1, y3)])
# extend
extend_path(path, x1)
return subsets(list(loops)) if small else loops
For a point of reference consider these results:
>>> D
[{1, 2, 3, 4, 5, 6, 7, 8},
{4, 11, 13, 14, 15, 16, 17},
{11, 20, 21, 23, 24, 28, 29},
{32, 34, 3, 37, 29, 30, 31},
{32, 39, 40, 41, 7, 44, 15, 20},
{6, 39, 47, 48, 16, 52, 53, 24},
{5, 47, 58, 61, 31},
{34, 8, 15, 52, 21, 58, 62, 63},
{37, 41, 14, 78, 23, 63},
{1, 40, 48, 17, 82, 23}]
>>> len(find_loops(D,small=1)) # 12 seconds
10440
>>> len(find_loops(D)) # 3 seconds
22261
(For the indended use, these times are not significant.)