Search code examples
c++cfloating-pointprecisionfloating-accuracy

Comparing a float and an int


Due to safety reasons, I need to perform the same computation twice, one time with only integer (int32) variables and another time with only float (float32) variables. At the end of the computation a comparison between the two results is taking place.

I read the article about comparing floating point numbers.

There are few things I don't understand:

  1. I haven't the following compression for float number: Assuming a and b are floats, is this way of comparison correct:

    if !(a > b) && !(a < b) is true, then _a_ and _b_ are probably identical otherwise not.
    
  2. If I cast a float number to integer, I get the integer part of the number, why by using an union object and there defining the same memory as int32 and float32 I get different solution? Doesn't it cast the float number to int32 as well?


Solution

  • why by using an union object and there defining the same memory as int32 and float32 i get different solution?

    The only reason the float/int union even makes sense is by virtue of the fact that both float and int share a storage size of 32-bits. What you are missing is an understanding that floats (in fact all floating point numbers) are stored in IEEE-754 Floating Point Format (floats are single-precision, doubles are double-precision, etc..)

    When you use the float/int union trick, you see the integer value that is the integer equivalent to the IEEE-754 Single-Precision Floating Point Format for the float. The two values have nothing to do with representing the same numeric value. You can just look at the memory they occupy as either a float or as an integer by virtue of them both occupying 32-bits of memory. If you look through the float window, you see what those 32-bits mean as a float. If on the other hand, you look at the same 32-bits as an integer, you see nothing more that what those same 32-bits would be if taken as an integer. An example looking at the binary representation usually helps.

    Take for example, the float value 123.456. If you look at the 32-bits in memory you see:

    The float value entered : 123.456001
    
    binary value in memory  : 01000010-11110110-11101001-01111001
    
    As unsigned integer     : 1123477881
    

    The IEEE-754 Single Precision Floating Point Representation is a specific floating point format in memory comprised of the following 3-components:

     0 1 0 0 0 0 1 0 1 1 1 1 0 1 1 0 1 1 1 0 1 0 0 1 0 1 1 1 1 0 0 1
    |- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -|
    |s|      exp      |                  mantissa                   |
    

    Where s is the sign-bit, exp is the biased exponent with the remaining 23-bits called the mantissa.There is no way you can expect to cast the float 123.456 and get anything close to integer 123, you are off by about 7 orders of magnitude

    Doesn't it cast the float number to int32 as well?

    Answer: No