std::pow very different behavior for different exponents

I am currently trying to optimize some code where 50% of the time is spent in std::pow(). I know that the exponent will always be a positive integer, and the base will always be a double in the interval (0, 1). For fun, I wrote a function:

inline double int_pow(double base, int exponent)
{
    double out = 1.0;
    for(int i = 0; i < exponent; i++)
    {
        out *= base;
    }

    return out;
}

I am compiling with:

> g++ fast-pow.cpp -O3 --std=c++11

I generated 100 million doubles between (0, 1) and compared the timings of (1) std::pow (2) my homemade int_pow function from above and (3) direct multiplication. Here's a sketch of my timing routine (this is a very quickly put-together test):

void time_me(int exp, size_t reps)
{
    volatile double foo = 0.0;
    double base = 0.0;

    size_t i;
    for (i = 0; i < reps; ++i)
    {
        base = ((double) rand() / (RAND_MAX)) + 1;
        foo = pow(base, exp);
        // foo = int_pow(base, exp);
        // foo = base * base * base;
    }

    // check that the loop made it to the end
    std::cout << foo << "  " << i <<  std::endl;
}

int main()
{
    std::clock_t start;

    start = std::clock();
    time_me(3, 1e8);
    std::cout << "Time: " << (std::clock() - start) / (double)(CLOCKS_PER_SEC / 1000) << std::endl;

    return 0;
}

Here are the timings I've observed for various exponents:

0: std::pow 0.71s, int_pow 0.77s
2: std::pow 1.31s, int_pow 0.80s, direct mult 0.86s
3: std::pow 6.9s (!!), int_pow 0.84s, direct mult 0.76s
5: Similar to 3:

My Questions

So with this, my questions are:

Why does the performance of std::pow appear to degrade so badly for powers greater than 2?
Is there an existing faster power function when the base or exponent types are known ahead of time?
Is there something completely obvious I'm overlooking? I'm about to go through gut std::pow for the cases with known integer exponents, and would hate to have missed something completely trivial.

Thanks!!

Solution

std::pow() is a general purpose function designed to accept any pair of floating point values. It performs expensive computations and should be considered a slow function. However, apparently, a lot of peopled have abused it for squaring, so implementation of pow() in IBM Accurate Mathematical Library (which is used by glibc) was optimized for that particular case:

sysdeps/ieee754/dbl-64/e_pow.c:

double
__ieee754_pow (double x, double y)
{
  ...
  ...
  if (y == 1.0)
    return x;
  if (y == 2.0)
    return x * x;
  if (y == -1.0)
    return 1.0 / x;
  if (y == 0)
    return 1.0;

As you can see, exponent values 0, 1 and -1 are also handled specially, but those, at least, are mathematically significant special cases, whereas squaring is merely a statistically significant case, that shouldn't otherwise deserve special handling). EDIT: Exponent values 0, 1, 2, and -1 are the only ones that allow expressing std::pow(x,n) with (much faster) arithmetic operations without any loss of accuracy. See this answer for more details. Thus exponent value of 2 is not just a statistically significant case. END EDIT

If you want a fast alternative to std::pow() for non-negative integer values of the exponent and don't care about the slight accuracy loss, then

for sufficiently small values of the exponent use your implementation of int_pow();
otherwise, use exponentiation by squaring approach.

The boundary value of the exponent for selecting between the 1st and 2nd methods must be found via careful benchmarking.