Search code examples
sql-serverfloating-pointfloating-accuracy

How does SQL Server store integer values of over 2^24 in a FLOAT(53) column?


I have a SQL Server database from which I can read integer values of (way) over 2^24 from a column that has datatype float. This confuses me since these values should be to big/lengthy to be accurately represented as a float. I managed to create a reproduction in my local DB.

To the best of my knowledge, 16777216 (=2^24) and 16777217 should have the same binary floating point representation. They are both mapped to 0 10010111 00000000000000000000000.

Now consider this:

DROP TABLE IF EXISTS F_ING_FLOATS;
CREATE TABLE F_ING_FLOATS (COL FLOAT(53));

INSERT INTO F_ING_FLOATS
    VALUES (16777216), (16777217);

Since I expect that both value are mapped to the same float, I would als expect that the distinction between the two is lost in the DB. However, the query

SELECT 
    COL, 
    COL + 1 [COL + 1],
    COL + 2 [COL + 2] 
FROM 
    F_ING_FLOATS;

annoyingly returns the correct values:

COL         COL + 1     COL + 2
------------------------------------------
16777216    16777217    16777218
16777217    16777218    16777219

I had expected to see 16777216 everywhere in the first two columns and 16777218 in both rows of the last column.

What is happening here that is beyond my grasp? Is SQL Server not actually storing 32-bit floating points, even though I defined COL FLOAT(53)?


The reason this is a problem for me, is that the values from the SQL Server DB get loaded into parquets into an Azure datalake. The in the datalake I see exactly what I expect: COL has two rows, both with value 16777216.

I'm in the process of understanding what is actually happening. And my suspicion is that SQL Server is not entirely honest when it says that COL is FLOAT(53).


To further elaborate on my confusion: I use this online converter to find out what the internal representation is. It tells me that 16777216 and 16777217 have the same representation, which should explain why the parquets in my datalake can't distinguish them (or at least I think it explains it).


Solution

  • For FLOAT(53), SQL uses the IEEE-754 binary64 format, which has a 53-bit significand. That format can distinguish 16,777,216 and 16,777,217.

    For FLOAT(24), the binary32 format would be used, and that format cannot distinguish 16,777,216 and 16,777,217 because it cannot represent 16,777,217.

    (Note that 16,777,216 and 16,777,217 do not “have the same binary floating point representation.” Rather, 16,777,217 does not have any representation at all. What happens is that, when 16,777,217 is converted to the binary32 format, from text or from another numerical format, a rounded result is produced. The result is exactly 16,777,216; it is not a representation of 16,777,217. This distinction is crucial for correct analysis, understand, design, and proofs of floating-point operations. The operations, including conversion, approximate real-number operations, but the represented values are exact.)