Search code examples
algorithmgroupingcombinatorics

An efficient method to generate all possible ways to pair up items in a data set


This is somewhat of a combinatorial problem; I'm trying to figure out an efficient way to pair up all items in a data set.

For example, I have an array of length 6: [1,2,3,4,5,6], and I want to make all possible pairings of the contents in the array as so:

[1,2],[3,4],[5,6]  
[1,2],[3,5],[4,6]  
[1,2],[3,6],[4,5]  
[1,3],[2,4],[5,6]  
[1,3],[2,5],[4,6]  
...

and so on. In this example, there would be 15 combinations total. In general, the number of possibilities in this solution should be (N-1)!! where N is the length of the array or the number of items to be paired up. N will always be even in this case. Ideally, an algorithm will generate all possibilities serially, without having to store very large arrays.

For example, one way would work somewhat like a 'round robin' scheduling algorithm where you split the array into N/2:

[1,2,3]  
[4,5,6]  

and rotate the [5,3,6] clockwise to generate 3 unique pairings (counting vertically) with [1,2,4] fixed as so:

[1,2,3]  
[4,5,6]

[1,2,5]  
[4,6,3]  

[1,2,6]  
[4,3,5]

and then rotate [4,2,3,6,5] clockwise to generate 5 unique pairings with [1] fixed, going from the smallest innermost case (N=4) outwards until the end, but recursively. As it is I'm not sure how to best express this as code or if there is a more efficient way of doing this, as N can be very large.


Solution

  • You can generate the pairs using the standard recursive algorithm for generating permutations of a list, but with a twist that each time you recurse you advance 2 elements along the list, and the first remaining element in the list is always the first element of the pair you output at each recursion, with the second of the pair being each of the other remaining elements.

    Always choosing the first remaining element as the first item in the pair avoids generating the same pairings with the pairs in different order.

    As with the standard algorithm, you can generate the permutations in place without making copies of the array, by swapping elements in place, recursing and then swapping back.

    Here is some C code to demonstrate the algorithm (I realise this question is tagged Fortran but just think of it as pseudo code). This code passes the list and count when it recurses, but you could implement it with items and itemcount as global variables:

    // start is the current position in the list, advancing by 2 each time
    // pass 0 as start when calling at the top level
    void generatePairings(int* items, int itemcount, int start)
    {
        if(itemcount & 1)
            return; // must be an even number of items
    
        // is this a complete pairing?
        if(start == itemcount)
        {
            // output pairings:
            int i;
            for(i = 0; i<itemcount; i+=2)
            {
                printf("[%d, %d] ", items[i], items[i+1]);
            }
            printf("\n");
            return;
        }
    
        // for the next pair, choose the first element in the list for the
        // first item in the pair (meaning we don't have to do anything 
        // but leave it in place), and each of the remaining elements for
        // the second item:
        int j;
        for(j = start+1; j<itemcount; j++)
        {
            // swap start+1 and j:
            int temp = items[start+1];
            items[start+1] = items[j];
            items[j] = temp;
    
            // recurse:
            generatePairings(items, itemcount, start+2);
    
            // swap them back:
            temp = items[start+1];
            items[start+1] = items[j];
            items[j] = temp;
        }
    }
    
    int main(void) {
        int items[6] = {1, 2, 3, 4, 5, 6};
        generatePairings(items, 6, 0);
    
        return 0;
    }
    

    Output:

    [1, 2] [3, 4] [5, 6] 
    [1, 2] [3, 5] [4, 6] 
    [1, 2] [3, 6] [5, 4] 
    [1, 3] [2, 4] [5, 6] 
    [1, 3] [2, 5] [4, 6] 
    [1, 3] [2, 6] [5, 4] 
    [1, 4] [3, 2] [5, 6] 
    [1, 4] [3, 5] [2, 6] 
    [1, 4] [3, 6] [5, 2] 
    [1, 5] [3, 4] [2, 6] 
    [1, 5] [3, 2] [4, 6] 
    [1, 5] [3, 6] [2, 4] 
    [1, 6] [3, 4] [5, 2] 
    [1, 6] [3, 5] [4, 2] 
    [1, 6] [3, 2] [5, 4] 
    

    If you are doing this on a list of large objects, it's more efficient to permute a list of indices and then use those to index into your array of objects, rather than doing expensive swap operations on large amounts of data.