Search code examples
algorithmcombinationsncr

Fast way of getting r-long combinations of set A that have at least one element from set B, which is a subset of A


For example, if A={0,1,2,3,4}, r=3 and B={1,4}, the result would be:

[0, 1, 2]
[0, 1, 3]
[0, 1, 4]
[0, 2, 4]
[0, 3, 4]
[1, 2, 3]
[1, 2, 4]
[1, 3, 4]
[2, 3, 4]

That's all the r-long combinations of A, excluding [0, 2, 3], because that one doesn't contain either 1 or 4.

The solution that I currently have is the following, using the fastest algorithm for getting normal combinations I know of, and just doing a simple check to see if combinations generated also contain an element of B (java):

    int[] A = new int[]{0,1,2,3,4};
    int[] B = new int[]{1,4};
    int n = A.length;
    int r = 3;

    int[] picks = new int[r]; //Holds indexes of elements in A
    for (int i = 0; i < picks.length; i++)
        picks[i] = i;

    int lastindex = picks.length - 1;

    outer:
    while (true) {
        int at = lastindex;
        while (true) {
            picks[at] += 1;
            if (picks[at] < n) {
                int displacement = picks[at] - at; // at + displacement = picks[at], at + displacement + 1 = picks[at] + 1 ,...

                // Make all picks elements after at y = picks[at] + x, so picks={0, 2, 4, 6, 18, 30} & at=3 --> picks={0, 2, 4, 5, 6, 7}
                // (Note that this example will never take place in reality, because the 18 or the 30 would be increased instead, depending on what n is)

                // Do the last one first, because that one is going to be the biggest,
                picks[lastindex] = lastindex + displacement;
                if (picks[lastindex] < n) { // and check if it doesn't overflow
                    for (int i = at + 1; i < lastindex; i++)
                        picks[i] = i + displacement;

                    int[] combination = new int[r];
                    for (int i = 0; i < r; i++)
                        combination[i] = A[picks[i]];
                    System.out.print(Arrays.toString(combination));
                    //^With this, all r-long combinations of A get printed

                    //Straightforward, bruteforce-ish way of checking if int[] combination
                    //contains any element from B
                    presence:
                    for (int p : combination) {
                        for (int b : B) {
                            if (p==b) {
                                System.out.print(" <-- Also contains an element from B");
                                break presence;
                            }
                        }
                    }

                    System.out.println();

                    break;
                }
            }
            at--;
            if (at < 0) {
                //Moving this check to the start of the while loop will make this natively support pick 0 cases (5C0 for example),
                //but reduce performance by I believe quite a bit. Probably better to special-case those (I haven't
                // done that in this test tho)
                break outer;
            }
        }
    }

output:

[0, 1, 3] <-- Also contains an element from B
[0, 1, 4] <-- Also contains an element from B
[0, 2, 3]
[0, 2, 4] <-- Also contains an element from B
[0, 3, 4] <-- Also contains an element from B
[1, 2, 3] <-- Also contains an element from B
[1, 2, 4] <-- Also contains an element from B
[1, 3, 4] <-- Also contains an element from B
[2, 3, 4] <-- Also contains an element from B

As written in the comments, I believe this method to be very rudimentary. Can anyone think of a faster way to do this?


Solution

  • Assuming you have a int[][] FindCombinations(int[] set, int length) function that returns a list of all the length-long combinations of set, do the following (pseudo-code):

    for i=1 to B.length
    {
      int bi = B[i];
    
      A = A - bi; // remove bi from A
      foreach C in FindCombinations(A, r-1)
      {
         output C+bi // output the union of C and {bi}
      }
    }
    

    This way all combinations contain at least one element from B (and may also contain elements of B that have not yet been used) without much extra work. All other combinations are eliminated at no cost (the don't have to be found at all) and also the test that a combination contains an element from B for each combination is also eliminated.

    Whether this algorithm is faster, greatly depends on how efficently you can add/remove elements from a set and the percentage of included vs excluded combinations (i.e. if you only end up excluding 1% of the total combinations it is probably not worth it)

    Note that when getting the combinations to union with {b[i]} these may also contain an element B[j] where j>i. When you get to the point that you get the combinations to union with B[j] none of them will contain B[i], so all combinations are unique.