Search code examples
adafixed-point

Ada fixed-point rounding of multiplication


The following code prints TRUE that means 0.0191*0.0191 is evaluating to 0.0003 and 0.0192*0.0192 is evaluating to 0.0004. However:

0.0191*0.0191 = 0.00036481
0.0192*0.0192 = 0.00036864

Had the rounding happened at the threshold of 0.00035, the corresponding threshold for square root should have been 0.0187.

If I change the delta to 10.0**(-5), this is not the case.

So my question is "How rounding of fixed-point calculation works in this situation?"

with Ada.Text_IO; use Ada.Text_IO;

procedure Main is
   type T is delta 10.0**(-4) range 0.0 .. 10.0;

   X1 : T := 0.0191;
   X2 : T := 0.0192;
   Y : T := 0.0004;
   B : Boolean;
begin
   B := (X1 * X1 = Y - T'Delta) and (X2 * X2 =Y);
   Put_Line(B'Image);
end Main;

Solution

  • I don’t think that 'rounding' is quite the problem here.

    You say

    type T is delta 10.0**(-4) range 0.0 .. 10.0;
    

    which means (Put_Line (T'Small'Image);) that the least significant bit of a T is 6.103515625E-05.

    This is the first binary power of 2 less than the delta; ARM 3.5.9(8) merely requires it to be less than or equal to the delta.

    So, 0.0191 is represented as 312 in binary, which multiplied by 'Small gives 1.904296875E-02, which squared is 3.62634658813477E-04, which divided by 'Small to get the binary representation is 5.94140625E+00. GNAT uses the permission in ARM 4.5.5(21) ("for ordinary fixed point types, if the mathematical result is between two multiples of the small, it is unspecified which of the two is the result") to convert this to 5, which corresponds to 3.0517578125E-04, giving your 0.0003 result when rounded for printing.

    0.0192 is represented as 314 in binary, which leads to a result of 6.017822265625E+00, converted to 6, which corresponds to 3.662109375E-04, rounded for printing to 0.0004.

    You'll have gathered that I messed around with your code to get these numbers!

    You might want to consider

    Delta_T : constant := 10.0 ** (-4);
    type T is delta Delta_T range 0.0 .. 10.0 with Small => Delta_T;