Search code examples
c++floating-pointfloating-point-conversion

What is causing floating-point-to-unsigned conversion discrepancies here?


If I run this code:

#include <iostream>
#include <cstdint>

int main()
{
    double a = -2.0;
    const double b = -2.0;
    using std::cout;

    cout << "Direct cast double -> uint16:\n";
    cout << "a1: " << static_cast<std::uint16_t>(a) << "\n";
    cout << "b1: " << static_cast<std::uint16_t>(b) << "\n";
    auto a2 = static_cast<std::uint16_t>(a);
    auto b2 = static_cast<std::uint16_t>(b);
    cout << "a2: " << a2 << "\n";
    cout << "b2: " << b2 << "\n";

    cout << "Indirect cast double -> uint16:\n";
    cout << "a3: " << static_cast<std::uint16_t>(static_cast<std::int16_t>(a)) << "\n";
    cout << "b3: " << static_cast<std::uint16_t>(static_cast<std::int16_t>(b)) << "\n";
    return 0;
}

I get the following results:

GCC x86-64:

Direct cast double -> uint16:
a1: 0
b1: 0
a2: 0
b2: 0
Indirect cast double -> uint16:
a3: 65534
b3: 65534

Clang x86-64:

Direct cast double -> uint16:
a1: 540684641
b1: 540684642
a2: 540684897
b2: 540684898
Indirect cast double -> uint16:
a3: 65534
b3: 65534

My questions are:

  • Which convertions are implementation-defined here? I know floating points have limited accuracy but I would never expect a1, b1, a2 and b2 to be 4 different values. This looks like a compiler bug to me, especially given the fact that 2^16 = 65536.
  • What are maximum allowed differences between different compilers and platforms for this program? The code originates from production code where we discovered significant differences in values when the code was run on a different platform.
  • Why inserting additional (intermediate) cast changes the result? Does introduction of intermediate type changes semantics (meaning of code) in the case of a3 and b3?

https://godbolt.org/z/K4njaKT7c


Solution

  • According to the standard (https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3690.pdf, Sec 4.9, §1) you are causing undefined behavior:

    A prvalue of a floating point type can be converted to a prvalue of an integer type. The conversion truncates; that is, the fractional part is discarded. The behavior is undefined if the truncated value cannot be represented in the destination type.

    UB means anything can happen, the program is even allowed to output values which are not representable by uint16_t (such as 540684641).