Search code examples
pythonscipyscipy.stats

How is the poisson data generated inside of the special.pdtr function?


I'm interested in understanding how the special.pdtr function is generated as I'd like to see the computational method. I'm working on a project to see if I could calculate it in a similar way to simulate another slightly more complex probability distribution.

I've looked in the documentation and all it does is reference the general form of the probability distribution but doesn't say how the code comes up with the output.


Solution

  • This is a scipy.special function, so to find it, you need to look at the file scipy/special/functions.json.

        "pdtr": {
            "cephes.h": {
                "pdtr": "dd->d"
            }
        },
    

    This tells us that pdtr is implemented using a function also called pdtr. (This isn't always the case - sometimes the implementation has a different name.) It also tells us that the code came from the Cephes library.

    Next, search the SciPy repo for "pdtr." Look for C, Cython, or Fortran, as that's usually how these are implemented.

    This finds an implementation inside scipy/special/cephes/pdtr.c.

    double pdtr(double k, double m)
    {
        double v;
    
        if (k < 0 || m < 0) {
            sf_error("pdtr", SF_ERROR_DOMAIN, NULL);
            return (NAN);
        }
        if (m == 0.0) {
            return 1.0;
        }
        v = floor(k) + 1;
        return (igamc(v, m));
    }
    

    This tells us that when it is computing the poisson CDF, it is actually using the incomplete gamma function to do so.

    You may also find it useful to read how the igamc function is implemented. To summarize, it chooses between one of three different approximations, depending on where in the gamma function it is. It also uses some identities of the incomplete gamma function to reduce more complex cases to simpler cases which can be evaluated more accurately.