I am looking to take the log base n (10 would be fine) of a 256 bit unsigned integer as a floating point in rust, with no loss of precision. It would seem to me that I need to implement an 8xf64 512 bit float 512 type and use a Taylor series to approximate ln and then the log. I know there are assembly methods to obtain the log of an f64. I am wondering if anyone on stack overflow can think of a divide and conquer or other method which would be more efficient. I would be amenable to inline assembly operating on the 8xf64 512 bit array.
This might be a useful starting point / outline of an algorithm. IDK if it will get you exact results, like error <= 0.5ulp (i.e. the last bit of the mantissa of your 512-bit float correctly rounded), or even error <= 1 ulp. Perhaps worth looking into what extended-precision calculators like bc
/ dc
/ calc
do.
I think log
converges quickly, so if you're going to do Newton iterations to refine, this bit-scan method might be a fast way to get a good starting point. Even if you only really need about 256 mantissa bits correct, I don't know how big a polynomial it would take to get that, and each multiply / add / fma would be on 512-bit (8x) or 320-bit (5x double precision).
For normal-sized floating-point numbers, the usual method takes advantage of the logarithmic nature of binary floating point. Without 256-bit HW float, you'll want to find the ilog2(int) yourself, i.e. position of the highest set bit (Efficiently find least significant set bit in a large array?).
Then treat your 256-bit integer as the mantissa of a number in the [1..2) or [0.5 .. 1) range, and yes use a polynomial approximation for log2() that's accurate over that limited range. (Before actual soft-float stuff, you might want to left-shift the number so it's normalized, i.e. the highest set bit is at the top. i.e. x <<= clz(x)
.
And then add the integer exponent + log_approx(mantissa) => log2(x).
Efficient implementation of log2(__m256d) in AVX2 has more detail on implementing log2(double)
(with SIMD doing 4 at a time, very different from doing one extended precision calculation).
It includes some links to implementations, e.g. Agner Fog's VCL using the ratio of two polynomials instead of one larger polynomial, and various tricks to maintain as much precision as possible: https://github.com/vectorclass/version2/blob/9874e4bfc7a0919fda16596144d393da5f8bf6c0/vectormath_exp.h#L942. Such as further range reduction: if x > SQRT2*0.5
, then increment the exponent and double the mantissa. (If 512-bit FP division is really expensive, you might just use more terms in one polynomial.) VCL is currently Apache licensed, so feel free to copy as much as you want from it into anything.
IDK if there are more tricks that might become more valuable for big extended precision, or for soft-float, which that implementation doesn't use. VCL's math functions spend more effort to maintain high precision than some faster approximations, but they're not exact.
If you don't need more exponent-range than a double
, you might be able to extend the double-double-arithmetic technique to wider floats, taking advantage of hardware FP to get 52 or 53 mantissa bits per 64-bit chunk. (From comments, apparently you're already planning to do that.)
You might not need 512-bit float to have sufficient precision. 256/52 = 4.92, so only 5x double
chunks have more precision (mantissa bits) than your input, and could exactly represent any 256-bit integer. (IEEE double does have a large enough exponent range; -1022 .. +1023). And have enough to spare that log2(int) should map each 256-bit input to a unique monotonic output, even with some rounding error.