Suppose that I have a sorted array, N, consisting of n elements. Now, given k, I need a highly efficient method to generate the k-combination that would be the middle combination (if all the k-combinations were lexicographically sorted).
Example:
N = {a,b,c,d,e} , k = 3
1: a,b,c
2: a,b,d
3: a,b,e
4: a,c,d
5: a,c,e
6: a,d,e
7: b,c,d
8: b,c,e
9: b,d,e
10: c,d,e
I need the algorithm to generate combination number 5.
The Wikipedia page on the combinatorial number system explains how this can be obtained (in a greedy way). However, since n is very large and I need to find the middle combination for all k's less than n, I need something much more efficient than that.
I'm hoping that since the combination of interest always lies in the middle, there is some sort of a straightforward method for finding it. For example, the first k-combination in the above list is always given by the first k elements in N, and similarly the last combination is always given by the last k elements. Is there such a way to find the middle combination as well?
If you are looking for a way to obtain the K-indexes from the lexicographic index or rank of a unique combination, then your problem falls under the binomial coefficient. The binomial coefficient handles problems of choosing unique combinations in groups of K with a total of N items.
I have written a class in C# to handle common functions for working with the binomial coefficient. It performs the following tasks:
Outputs all the K-indexes in a nice format for any N choose K to a file. The K-indexes can be substituted with more descriptive strings or letters.
Converts the K-indexes to the proper lexicographic index or rank of an entry in the sorted binomial coefficient table. This technique is much faster than older published techniques that rely on iteration. It does this by using a mathematical property inherent in Pascal's Triangle and is very efficient compared to iterating over the set.
Converts the index in a sorted binomial coefficient table to the corresponding K-indexes. The technique used is also much faster than older iterative solutions.
Uses Mark Dominus method to calculate the binomial coefficient, which is much less likely to overflow and works with larger numbers.
The class is written in .NET C# and provides a way to manage the objects related to the problem (if any) by using a generic list. The constructor of this class takes a bool value called InitTable that when true will create a generic list to hold the objects to be managed. If this value is false, then it will not create the table. The table does not need to be created in order to use the 4 above methods. Accessor methods are provided to access the table.
There is an associated test class which shows how to use the class and its methods. It has been extensively tested with several cases and there are no known bugs.
To read about this class and download the code, see Tablizing The Binomial Coeffieicent.
The following tested code will calculate the median lexicographic element for any N Choose K combination:
void TestMedianMethod()
{
// This test driver tests out the GetMedianNChooseK method.
GetMedianNChooseK(5, 3); // 5 choose 3 case.
GetMedianNChooseK(10, 3); // 10 choose 3 case.
GetMedianNChooseK(10, 5); // 10 choose 5 case.
}
private void GetMedianNChooseK(int N, int K)
{
// This method calculates the median lexicographic index and the k-indexes for that index.
String S;
// Create the bin coeff object required to get all
// the combos for this N choose K combination.
BinCoeff<int> BC = new BinCoeff<int>(N, K, false);
int NumCombos = BinCoeff<int>.GetBinCoeff(N, K);
// Calculate the median value, which in this case is the number of combos for this N
// choose K case divided by 2.
int MedianValue = NumCombos / 2;
// The Kindexes array holds the indexes for the specified lexicographic element.
int[] KIndexes = new int[K];
// Get the k-indexes for this combination.
BC.GetKIndexes(MedianValue, KIndexes);
StringBuilder SB = new StringBuilder();
for (int Loop = 0; Loop < K; Loop++)
{
SB.Append(KIndexes[Loop].ToString());
if (Loop < K - 1)
SB.Append(" ");
}
// Print out the information.
S = N.ToString() + " choose " + K.ToString() + " case:\n";
S += " Number of combos = " + NumCombos.ToString() + "\n";
S += " Median Value = " + MedianValue.ToString() + "\n";
S += " KIndexes = " + SB.ToString() + "\n\n";
Console.WriteLine(S);
}
Output:
5 choose 3 case:
Number of combos = 10
Median Value = 5
KIndexes = 4 2 0
10 choose 3 case:
Number of combos = 120
Median Value = 60
KIndexes = 8 3 1
10 choose 5 case:
Number of combos = 252
Median Value = 126
KIndexes = 9 3 2 1 0
You should be able to port this class over fairly easily to the language of your choice. You probably will not have to port over the generic part of the class to accomplish your goals. Depending on the number of combinations you are working with, you might need to use a bigger word size than 4 byte ints.