I'm trying to understand floating point arithmetic in GNURadio and started looking into their tests. The test generates random float input and random taps, then pass everything to the filter. Later it compares expected output and actual output using some margin.
There is a cryptic comment about that margin:
// we use a sloppy error margin because on the x86 architecture,
// our reference implementation is using 80 bit floating point
// arithmetic, while the SSE version is using 32 bit float point
// arithmetic.
I cannot find anywhere 80bit arithmetic in the source code. Can anyone help me? Or just explain why error margin depends on the taps size?
On x86+87 even simply using double
you get 80-bit precision for intermediate results because the FPU stack uses 80-bit floating point numbers internally.
If your code expects and depends on the rounding of 64-bit or 32-bit float math you can get surprises.
For example I've been hit quite a few times by something like x < y
being true but after assigning z = x
you may get z >= y
(all vars declared double
). This may happen if x
ends up being allocated in a 80-bit FPU register and z
instead is a real 64-bit floating point variable in memory.
g++ has a specific option to avoid these issues (-ffloat-store
) that prevents the use of extra bits (but however slows down math-heavy code quite a bit).