Single Precision and Double Precision IEEE 754 Base 2 Floating Point Values can represent a range of integers without loss.
Given a product A = BC
, where B
and C
are integers represented lossless as floating point values, is the product A
always lossless if it mathematically falls within the lossless range of the floating point type?
More specifically, do we know if common modern processors will ensure that the products will be calculated so that integer products behave as described above?
EDIT: To clarify per the links above the ranges of integers that can be represented without loss are +-253 in Double Precision and +-16777216 in single precision.
EDIT: The IEEE-754 requires operations to be rounded to the closest representable precision, but I specifically want to know about the behavior of modern processors
For any elementary operation, IEEE-754 requires that, if the mathematical result is representable, then it is the result.
The question is not tagged with IEEE-754 and therefore just asks about floating-point generally. No sensible system would give inaccurate results when exact results are representable, but it would nonetheless be possible to create one.
Here is a program to test the float
cases.
#include <math.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
static void Test(float x, float y, float z)
{
float o = x*y;
if (o == z) return;
printf("Error, %.99g * %.99g != %.99g.\n", x, y, z);
exit(EXIT_FAILURE);
}
static void TestSigns(float x, float y, float z)
{
Test(-x, -y, +z);
Test(-x, +y, -z);
Test(+x, -y, -z);
Test(+x, +y, +z);
}
int main(void)
{
static const int32_t SignificandBits = 24;
static const int32_t Bound = 1 << SignificandBits;
// Test all x * y where x or y is zero.
TestSigns(0, 0, 0);
for (int32_t y = 1; y <= Bound; ++y)
{
TestSigns(0, y, 0);
TestSigns(y, 0, 0);
}
/* Iterate x through all non-zero significands but right-adjusted instead
of left-adjusted (hence making the low bit set, so the odd numbers).
*/
for (int32_t x = 1; x <= Bound; x += 2)
{
/* Iterate y through all non-zero significands such that x * y is
representable. Observe that since x and y each have their low bits
set, x * y has its low bit set. Then, if Bound <= x * y, there is
a also bit set outside the representable significand, so the
product is not representable.
*/
for (int32_t y = 1; (int64_t) x * y < Bound; y += 2)
{
/* Test all pairs of numbers with these significands, but varying
exponents, as long as they are in bounds.
*/
for (int xs = x; xs <= Bound; xs *= 2)
for (int ys = y; ys <= Bound; ys *= 2)
TestSigns(xs, ys, (int64_t) xs * ys);
}
}
}