Search code examples
cperformancealgorithmmathematical-optimizationbinomial-coefficients

Which is better way to calculate nCr


Approach 1:
C(n,r) = n!/(n-r)!r!

Approach 2:
In the book Combinatorial Algorithms by wilf, i have found this:
C(n,r) can be written as C(n-1,r) + C(n-1,r-1).

e.g.

C(7,4) = C(6,4) + C(6,3) 
       = C(5,4) + C(5,3) + C(5,3) + C(5,2)
       .   .
       .   .
       .   .
       .   .
       After solving
       = C(4,4) + C(4,1) + 3*C(3,3) + 3*C(3,1) + 6*C(2,1) + 6*C(2,2)

As you can see, the final solution doesn't need any multiplication. In every form C(n,r), either n==r or r==1.

Here is the sample code i have implemented:

int foo(int n,int r)
{
     if(n==r) return 1;
     if(r==1) return n;
     return foo(n-1,r) + foo(n-1,r-1);
}

See output here.

In the approach 2, there are overlapping sub-problems where we are calling recursion to solve the same sub-problems again. We can avoid it by using Dynamic Programming.

I want to know which is the better way to calculate C(n,r)?.


Solution

  • Both approaches will save time, but the first one is very prone to integer overflow.

    Approach 1:

    This approach will generate result in shortest time (in at most n/2 iterations), and the possibility of overflow can be reduced by doing the multiplications carefully:

    long long C(int n, int r) {
        if(r > n - r) r = n - r; // because C(n, r) == C(n, n - r)
        long long ans = 1;
        int i;
    
        for(i = 1; i <= r; i++) {
            ans *= n - r + i;
            ans /= i;
        }
    
        return ans;
    }
    

    This code will start multiplication of the numerator from the smaller end, and as the product of any k consecutive integers is divisible by k!, there will be no divisibility problem. But the possibility of overflow is still there, another useful trick may be dividing n - r + i and i by their GCD before doing the multiplication and division (and still overflow may occur).

    Approach 2:

    In this approach, you'll be actually building up the Pascal's Triangle. The dynamic approach is much faster than the recursive one (the first one is O(n^2) while the other is exponential). However, you'll need to use O(n^2) memory too.

    # define MAX 100 // assuming we need first 100 rows
    long long triangle[MAX + 1][MAX + 1];
    
    void makeTriangle() {
        int i, j;
    
        // initialize the first row
        triangle[0][0] = 1; // C(0, 0) = 1
    
        for(i = 1; i < MAX; i++) {
            triangle[i][0] = 1; // C(i, 0) = 1
            for(j = 1; j <= i; j++) {
                triangle[i][j] = triangle[i - 1][j - 1] + triangle[i - 1][j];
            }
        }
    }
    
    long long C(int n, int r) {
        return triangle[n][r];
    }
    

    Then you can look up any C(n, r) in O(1) time.

    If you need a particular C(n, r) (i.e. the full triangle is not needed), then the memory consumption can be made O(n) by overwriting the same row of the triangle, top to bottom.

    # define MAX 100
    long long row[MAX + 1];
    
    int C(int n, int r) {
        int i, j;
    
        // initialize by the first row
        row[0] = 1; // this is the value of C(0, 0)
    
        for(i = 1; i <= n; i++) {
            for(j = i; j > 0; j--) {
                 // from the recurrence C(n, r) = C(n - 1, r - 1) + C(n - 1, r)
                 row[j] += row[j - 1];
            }
        }
    
        return row[r];
    }
    

    The inner loop is started from the end to simplify the calculations. If you start it from index 0, you'll need another variable to store the value being overwritten.