Search code examples
searchoptimizationduplicatesutilitya-star

constraint to avoid generating duplicates in this search task


I have to solve the following optimization problem: Given a set of elements (E1,E2,E3,E4,E5,E6) create an arbitrary set of sequences e.g.

seq1:E1,E4,E3
seq2:E2
seq3:E6,E5

and given a function f that gives a value for every pair of elements e.g.

f(E1,E4) = 5
f(E4,E3) = 2
f(E6,E5) = 3
...

in addition it also gives a value for the pair of an element combined with some special element T, e.g.

f(T,E2) = 10
f(E2,T) = 3
f(E5,T) = 1
f(T,E6) = 2
f(T,E1) = 4
f(E3,T) = 2
...

The utility function that must be optimized is the following: The utility of a sequence set is the sum of the utility of all sequences. The utility of a sequence A1,A2,A3,...,AN is equal to f(T,A1)+f(A1,A2)+f(A2,A3)+...+f(AN,T) for our example set of sequences above this leads to

seq1: f(T,E1)+f(E1,E4)+f(E4,E3)+f(E3,T) = 4+5+2+2=13
seq2: f(T,E2)+f(E2,T) =10+3=13
seq3: f(T,E6)+f(E6,E5)+f(E5,T) =2+3+1=6
Utility(set) = 13+13+6=32

I try to solve a larger version (more elements than 6, rather 1000) of this problem using A* and some heuristic. Starting from zero sequences and stepwise adding elements either to existing sequences or as a new sequence, until we obtain a set of sequences containing all elements. The problem I run into is the fact that while generating possible solutions I end up with duplicates, for example in above example all the following combinations are generated:

seq1:E1,E4,E3
seq2:E2
seq3:E6,E5
+
seq1:E1,E4,E3
seq2:E6,E5
seq3:E2
+
seq1:E2
seq2:E1,E4,E3
seq3:E6,E5
+
seq1:E2
seq2:E6,E5
seq3:E1,E4,E3
+
seq1:E6,E5
seq2:E2
seq3:E1,E4,E3
+
seq1:E6,E5
seq2:E1,E4,E3
seq3:E2

which all have equal utility, since the order of the sequences does not matter. These are all permutations of the 3 sequences, since the number of sequences is arbitrairy there can be as much sequences as elements and a faculty(!) amount of duplicates generated... One way to solve such a problem is keeping already visited states and don't revisit them. However since storing all visited states requires a huge amount of memory and the fact that comparing two states can be a quite expensive operation, I was wondering whether there wasn't a way I could avoid generating these in the first place.

THE QUESTION: Is there a way to stepwise construct all these sequence constraining the adding of elements in a way that only combinations of sequences are generated rather than all variations of sequences.(or limit the number of duplicates)

As an example, I only found a way to limit the amount of 'duplicates' generated by stating that an element Ei should always be in a seqj with j<=i, therefore if you had two elements E1,E2 only

seq1:E1
seq2:E2

would be generated, and not

seq1:E2
seq2:E1

I was wondering whether there was any such constraint that would prevent duplicates from being generated at all, without failing to generate all combinations of sets.


Solution

  • Well, it is simple. Allow generation of only such sequences that are sorted according to first member, that is, from the above example, only

    seq1:E1,E4,E3
    seq2:E2
    seq3:E6,E5
    

    would be correct. And this you can guard very easily: never allow additional sequence that has its first member less than the first member of its predecessor.