Below I have adapted code from William Kahan and K.C. Ng (look at the comment block on the bottom) written in 1986 to produce an approximation of 1 / sqrt(x) where x is an IEEE-754 double precision floating point number. It is this code that Cleve Moler and Greg Walsh adapted to become the "fast inverse square root" made famous by Quake III.
#include <stdio.h>
int main() {
double x = 3.1415926535;
double y, z;
unsigned long int x0, y0, k;
unsigned long long int xi, yi;
static int T2[64]= {
0x1500, 0x2ef8, 0x4d67, 0x6b02, 0x87be, 0xa395, 0xbe7a, 0xd866,
0xf14a, 0x1091b,0x11fcd,0x13552,0x14999,0x15c98,0x16e34,0x17e5f,
0x18d03,0x19a01,0x1a545,0x1ae8a,0x1b5c4,0x1bb01,0x1bfde,0x1c28d,
0x1c2de,0x1c0db,0x1ba73,0x1b11c,0x1a4b5,0x1953d,0x18266,0x16be0,
0x1683e,0x179d8,0x18a4d,0x19992,0x1a789,0x1b445,0x1bf61,0x1c989,
0x1d16d,0x1d77b,0x1dddf,0x1e2ad,0x1e5bf,0x1e6e8,0x1e654,0x1e3cd,
0x1df2a,0x1d635,0x1cb16,0x1be2c,0x1ae4e,0x19bde,0x1868e,0x16e2e,
0x1527f,0x1334a,0x11051,0xe951, 0xbe01, 0x8e0d, 0x5924, 0x1edd };
/* Convert double to unsigned long long (64 bits wide) */
xi = *(unsigned long long int*)&x;
/* Extract only the first half of the bits of the integer */
x0 = (xi & 0xffffffff00000000) >> 32;
/* Perform bit shift and subtract from an integer constant to get k */
k = 0x5fe80000 - (x0 >> 1);
/* T2 is indexed by mostly by exponent bits.
Only 5 highest bits from orig 52 in mantissa are used to index T2 */
y0 = k - T2[63 & (k >> 14)];
/* Pad with zeros for the LS 32 bits, convert back to long long */
yi = ((unsigned long long int) y0 << 32);
/* Get double from bits making up unsigned long long */
y = *(double*)&yi;
/* 1/sqrt(pi) ~ 0.564 */
printf("%lf\n", y);
return 0;
}
It looks like it is doing the same sort of thing as the Quake version (minus the newton-raphson step)
float Q_rsqrt( float number )
{
long i;
float x2, y;
const float threehalfs = 1.5F;
x2 = number * 0.5F;
y = number;
i = * ( long * ) &y; // evil floating point bit level hacking
i = 0x5f3759df - ( i >> 1 ); // what the fuck?
y = * ( float * ) &i;
y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
return y;
}
We start by casting the bits of a double as an integer that is the approximate binary logarithm of the double we started with (Mitchell's method).
We shift those bits right and subtract them from a constant (for this comparison it doesn't matter that the lower 32 bits have been dropped), which is an approximate division in the logarithmic domain.
The next step I'm less clear on because the lookup table is indexed by a very tiny amount of the original number--mostly exponent bits. So there's a correction happening but I'm not sure how to interpret it.
Finally we cast the integer bits (plus 32 0s for the LS half) as a floating point double, giving us the approximate anti-logarithm.
So I understand (or think I do) 3 of the 4 steps here, but the third step--what the lookup table is doing, how it is indexed over the range of the doubles, and why?
xi = *(unsigned long long int*)&x;
This reinterprets the bits representing the double
x
as an unsigned long long
, which should be a 64-bit type. This method of reinterpretation is not defined by the C standard but is supported by many compilers, possibly subject to command-line switches.
Reinterpreting as an unsigned long long
makes the bits accessible to manipulation.
/* Extract only the first half of the bits of the integer */
x0 = (xi & 0xffffffff00000000) >> 32;
As it says, x0
now contains the high 32 bits of the double
. This includes the sign bit (which is presumably 0, as we should only be taking the inverse square root of a positive number), 11 bits encoding the exponent, and 20 bits of the principal encoding of the significand.
/* Perform bit shift and subtract from an integer constant to get k */
k = 0x5fe80000 - (x0 >> 1);
x0 >> 1
starts an approximation of the square root. Let’s see why: A binary floating-point representation is ±f•2e, for some f in a certain fixed-point format and an integer e in a certain range. For positive numbers, we of course have f•2e without the sign. Let q and r be the quotient and remainder of dividing e by 2, so e = 2q+r. Then sqrt(f•2e) = sqrt(f•2r)•2q. In x0 >> 1
, we have shifted the exponent bits right, setting us up to have q in the exponent field.
Not quite, though, because the raw exponent field is biased. It actually contains e+1023. Shifting gives us e/2+511½, with that fraction shifting out of the field. But we will deal with this. In particular, the bias needs to be reset to 1023, but that is built into the 5FE8000016 constant.
Next, - (x0 >> 1)
changes this to an approximation of the inverse square root, simply because negating the exponent takes the reciprocal. The significand is still a problem, though.
Then 0x5fe80000 - (x0 >> 1)
is the double
encoding of this very crude approximation of the inverse square root with 5FE8000016 added.
Recall the exponent encoding needs to have a bias of 1023 from the exponent, but we cut it in half and negated it, so - (x0 >> 1)
has a bias of −511½ that needs to be adjusted to +1023, so we need to add 1534½. That needs to go into the exponent field, which is at bits 62 to 52 in the 64-bit double
encoding, so at 30 to 20 in this 32-bit portion from the high half. So we need to add 1534½ starting at bit 20, or 1534½•220 = 1,609,039,872 = 5FE8000016.
Ta da! The exponent bits of 0x5fe80000 - (x0 >> 1)
are exactly the exponent bits of the double
2−q. And its significand bits are the shifted-out exponent bit plus the original significand bits shifted right one position (with one shifted out and lost). (Except note that shifted-out exponent bit has been inverted—it had that bit in the 816 added to it.)
Now we have the correct exponent, but the significand bits in bits 19 to 0 are wacky. They contain one bit from the exponent and some negated and shifted bits from the significand encoding. To get an approximation of the inverse square root, we would have to fix these up. We could take the shifted bits and work out what the original significand was (up to the precision given by the 19 bits we have from it) and multiply by 2r, and that would give us the portion of the original number that is not accounted for in the exponent we have prepared so far. Then we take the inverse square root of that portion and put it in place of the significand, and we are done.
That’s a fair amount of calculation to do. Instead, we are going to precompute the results with some other software and record the results in a table.
63 & (k >> 14)
extracts bits 19 to 14, leaving them in bits 5 to 0. Those are the shifted-out bit from the exponent and the first five significand bits. So they are the most significant bits of the original number that are not already accounted for in the exponent field we calculated.
T2[63 & (k >> 14)]
uses those bits to look up a value in a table. That value is precomputed with other software. It contains the adjustment from what the bits in k
are to what we want them to be. Then k - T2[63 & (k >> 14)]
applies that adjustment, so it is the approximation of an inverse square root (specifically the high 32 bits of the double
encoding of that approximation).
The material you have referenced does not say how the table was computed. Since each table entry is indexed by only 6 bits, it will be used for many double
values of x
. The entry might have been computed by calculating the inverse square root for the least value of x
that entry would be used for, the inverse square root for the greatest value of x
that entry would be used for, and taking some value between those two. Then the table entry is set so that, when it is subtracted from k
, it cancels the six bits used to to index the table and then adds bits that encode the target value. (Like the exponent adjustment added 511½ and also 1023, we want the table entry to subtract the unwanted 63 & (k >> 14)
and add the desired bits.)
There is some finesse involved in this, because the table entry can cancel the six bits used to index the table (because, for each table entry, we know exactly which six bits were used to look it up), but it cannot cancel the lower bits (because they are not part of the index and can vary). So some additional work is needed to select which particular target value to use. Also, which target value to use is affected by whether we want to minimize absolute error or relative error or something else. What you have shown in the question does not speak to this, so I cannot say precisely how the table was calculated.
Note the code is incomplete and should not be used as an inverse square root without further work or precaution. I expect it does not correctly handle subnormals, infinities, or NaNs.
Norbert Juffa offers this alternative table to minimize relative error in the approximation resulting from above code (which may be only an initial approximation to be used in further refinement steps):
uint32_t T2_NJ[64] =
{
0x01289, 0x02efb, 0x04d6a, 0x06b05, 0x087c3, 0x0a39b, 0x0be82, 0x0d86e,
0x0f153, 0x10927, 0x11fdb, 0x13563, 0x149af, 0x15cb1, 0x16e57, 0x17e91,
0x18d4a, 0x19a6e, 0x1a5e7, 0x1af9d, 0x1b777, 0x1bd59, 0x1c123, 0x1c2b7,
0x1c1ef, 0x1bea5, 0x1b8ae, 0x1afdc, 0x1a3fc, 0x194d5, 0x18229, 0x16bb4,
0x16874, 0x17a1c, 0x18aa4, 0x19a00, 0x1a824, 0x1b501, 0x1c08b, 0x1cab1,
0x1d365, 0x1da95, 0x1e02e, 0x1e41f, 0x1e652, 0x1e6b1, 0x1e525, 0x1e194,
0x1dbe4, 0x1d3f8, 0x1c9b0, 0x1bcea, 0x1ad82, 0x19b51, 0x1862c, 0x16de4,
0x15248, 0x1331f, 0x1102e, 0x0e933, 0x0bde5, 0x08df6, 0x0590c, 0x01ec8
};