Search code examples
javascriptstatisticscdf

Any obvious pitfalls in this Student's t-distribution CDF computation?


I have been looking for an efficient function that computes the CDF (cumulative distribution function) for the Student's t-distribution.

Here's what I have settled with after looking at another stackoverflow question, JStat library, the_subtprob function on Line 317 here.

Looking at the notes in the last reference led me to an out of print book, which does not help

If you are interested in more precise algorithms you could look at: StatLib: http://lib.stat.cmu.edu/apstat/ ;
Applied Statistics Algorithms by Griffiths, P. and Hill, I.D.
Ellis Horwood: Chichester (1985)

The cmu site had a FORTRAN function that I translated as I show below.

Looking at the other sources, I find higher order functions like the incomplete beta, log gamma, and the implementation seems more complex, and in one case iterative.

I'm wondering if there are any known pitfalls of this implementation. It appears to produce the same results as the others. Any thoughts on how one would go about evaluating this would be helpful as well.

function tcdf (t, v) {
    //
    // ALGORITHM AS 3  APPL. STATIST. (1968) VOL.17, P.189
    // STUDENT T PROBABILITY (LOWER TAIL)
    //               
    var b = v / (v + t * t),
        c = 1,
        s = 1,
        ioe = v % 2,
        k = 2 + ioe;

    if (v < 1) {
        return 0;
    }
    if (v >= 4) {
        while (k <= v - 2) {
            c *= b - b / k;
            s += c;
            k += 2;
        }
    }
    c = t / Math.sqrt(v);

    if (1 !== ioe) {
        return 0.5 + 0.5 * Math.sqrt(b) * c * s;
    }
    return 0.5 + ((1 === v ? 0 : b * c * s) + Math.atan(c)) / Math.PI;
}

Solution

  • Two possible issue with this algorithm.

    1. Handling large values of v. When v becomes large, we should recover the standard normal distribution. However, you have a while loop over v. So v=1000000 say, becomes slow

    2. Tail accuracy. How does the algorithm cope in the extreme tails? typically, we need to work with log to avoid rounding errors.