I'm writing a small library for statistical sampling which needs to run as fast as possible. In profiling I discovered that around 40% of the time taken in the function is spent computing Stirling's approximation for the logarithm of the factorial. I'm focusing my optimization efforts on that piece of it. Here's my code (which uses MPFR):
const double AL[8] =
{ 0.0, 0.0, 0.6931471806, 1.791759469, 3.178053830, 4.787491743,
6.579251212, 8.525161361 };
void HGD::mpfr_afc(mpfr_t &ret, const mpfr_t &val){
if(mpfr_cmp_ui(val, 7) <= 0){
mpfr_set_d(ret, AL[mpfr_get_ui(val, MPFR_RNDN)], MPFR_RNDN);
}else{
mpfr_set(ret, val, MPFR_RNDN);
mpfr_add_d(ret, ret, 0.5, MPFR_RNDN);
mpfr_log(LV, val, MPFR_RNDN);
mpfr_mul(ret, LV, ret, MPFR_RNDN);
mpfr_sub(ret, ret, val, MPFR_RNDN);
mpfr_add_d(ret, ret, 0.399089934, MPFR_RNDN);
}
}
I have a couple different ideas:
Are there other approaches I could take?
Switch to native arithmetic when numbers can fit in machine data types
That would be my first attempt. MPFR is likely to be a performance killer.
It seems to me you want to compute the logarithm of n! which you are already approximating with Stirling's formula.
Note that n!=Gamma(n+1). There are (seemingly) highly optimized functions to compute both the Gamma function and its logarithm. For example:
I would roll my own coarser approximation only if all the above fails performance-wise.