c++statistics normal-distribution anomaly-detection

How to convert percentage to z-score of normal distribution in C/C++?

The goal is to say: "These values lie within a band of 95 % of values around the mean in a normal distribution."

Now, I am trying to convert percentage to z-score, so then I can get the precise range of values. Something like <lower bound , upper bound> would be enough.

So I need something like

double z_score(double percentage) {
   // ...
}

// ...

// according to https://en.wikipedia.org/wiki/68–95–99.7_rule
z_score(68.27) == 1
z_score(95.45) == 2
z_score(99.73) == 3

I found an article explaining how to do it with a function from boost library, but

double z_score( double percentage ) {
    return - sqrt( 2 ) / boost::math::erfc_inv( 2 * percentage / 100 );
}

does not work properly and it returns weird values.

z_score(95) == 1.21591 // instead of 1.96

Also the boost library is kinda heavy and I plan to use it for Ruby gem, so it should be as lightweight as possible.

Does anyone have an idea?

Solution

I say you were "close enough".

#include <iostream>
#include <boost/math/special_functions/erf.hpp>
#include <cmath>

double z_score(double percentage) {
    return sqrt(2) * boost::math::erf_inv(percentage / 100);
}

int main() {
    #define _(x)  std::cout << x << " " << z_score(x) << "\n"
    _(68.27);
    _(95.45);
    _(99.73);
}

outputs:

68.27 1.00002
95.45 2
99.73 2.99998

I do not know how you got that - in front, and that it's erf>>c<<_inv and that it's sqrt(2) divided by. From here wiki Normal_distribution#Standard_deviation_and_coverage I read that:

p <- this is probability, ie. your input
u <- mean value
o <- std dev
n <- the count of std deviations from mean, ie. 1, 2, 3 etc.
p = F(u + no) - F(u + no) = fi(n) - fi(-n) = erf(n / sqrt(2))
p = erf(n / sqrt(2))
erf_inv(p) = n / sqrt(2)
erf_inv(p) * sqrt(2) = n
n = sqrt(2) * erf_inv(p)

Also the boost library is kinda heavy

A 5 min search resulted in this and this C implementations erf_inv.