Search code examples
c++statisticsnormal-distributionanomaly-detection

How to convert percentage to z-score of normal distribution in C/C++?


The goal is to say: "These values lie within a band of 95 % of values around the mean in a normal distribution."

Now, I am trying to convert percentage to z-score, so then I can get the precise range of values. Something like <lower bound , upper bound> would be enough.

So I need something like

double z_score(double percentage) {
   // ...
}

// ...

// according to https://en.wikipedia.org/wiki/68–95–99.7_rule
z_score(68.27) == 1
z_score(95.45) == 2
z_score(99.73) == 3

I found an article explaining how to do it with a function from boost library, but

double z_score( double percentage ) {
    return - sqrt( 2 ) / boost::math::erfc_inv( 2 * percentage / 100 );
}

does not work properly and it returns weird values.

z_score(95) == 1.21591 // instead of 1.96

Also the boost library is kinda heavy and I plan to use it for Ruby gem, so it should be as lightweight as possible.

Does anyone have an idea?


Solution

  • I say you were "close enough".

    #include <iostream>
    #include <boost/math/special_functions/erf.hpp>
    #include <cmath>
    
    double z_score(double percentage) {
        return sqrt(2) * boost::math::erf_inv(percentage / 100);
    }
    
    int main() {
        #define _(x)  std::cout << x << " " << z_score(x) << "\n"
        _(68.27);
        _(95.45);
        _(99.73);
    }
    

    outputs:

    68.27 1.00002
    95.45 2
    99.73 2.99998
    

    I do not know how you got that - in front, and that it's erf>>c<<_inv and that it's sqrt(2) divided by. From here wiki Normal_distribution#Standard_deviation_and_coverage I read that:

    p <- this is probability, ie. your input
    u <- mean value
    o <- std dev
    n <- the count of std deviations from mean, ie. 1, 2, 3 etc.
    p = F(u + no) - F(u + no) = fi(n) - fi(-n) = erf(n / sqrt(2))
    p = erf(n / sqrt(2))
    erf_inv(p) = n / sqrt(2)
    erf_inv(p) * sqrt(2) = n
    n = sqrt(2) * erf_inv(p)
    

    Also the boost library is kinda heavy

    A 5 min search resulted in this and this C implementations erf_inv.