Search code examples
sqlsql-serverdistributionfactorialhyper

Is there a SQL solution for Hypergeometric Distribution for a finite set?


I am browsing for an equivalent SQL implementation of HYPGEOM.DIST in Microsoft Excel. I have explored the option of seeding a factorial table with distributions from 1 to N (1...N), but am seeking additional options.

HYPGEOM.DIST Equation

Given:

x = sample_s
n = number_sample
M = population_s
N = number_pop

Reference Excel functionality: https://support.office.com/en-us/article/hypgeom-dist-function-6dbd547f-1d12-4b1f-8ae5-b0d9e3d22fbf


Solution

  • You can precalculate factorials for any range you need using whatever technique you are most comfortable with, and store them in a separate table.

    The rest, I presume, should be trivial.

    EDIT: On second thought, I'm not sure how you plan to go above the limits you have already encountered. 170! ~= 7.25741562E+307, and the maximum value that can be stored as float is 1.79E+308, according to the documentation. It seems SQL Server is not really suitable for your task; you would have to look elsewhere for systems that handle (and store) very large numbers.

    It might be possible to resort to calculation in logarithms rather than actual values - this is especially tempting, considering that the original form of Stirling's formula is actually logarithmic. However that would require rewriting all these formulae into their logarithmic forms. This involves a fair bit of math, and my personal knowledge is insufficient even to tell whether it's actually achievable. But it doesn't mean it should stop you from trying :)