Search code examples
c++gccarm64

Conversion of float to integer in ARM based system


I have the following piece of code called main.cpp that converts an IEE 754 32-bit hex value to float and then converts it into unsigned short.

#include <iostream>
using namespace std;

int main() {
    unsigned int input_val = 0xc5dac022;
    float f;
    *((int*) &f) = input_val;
    unsigned short val = (unsigned short) f;
    cout <<"Val = 0x" << std::hex << val << endl;
}

I build and run the code using the following command:

g++ main.cpp -o main
./main

When I following code in my normal PC, I get the correct answer which is 0xe4a8. But when I run the same code on an ARM processor, it gives an output of 0x0.

Is this happening because I am building the code with normal gcc instead of aarch64? The code gives correct output for some other test cases on the ARM processor but gives an incorrect output for the given test value. How can I solve this issue?


Solution

  • First, your "type pun" via pointers violates the strict aliasing rule, as mentioned in comments. You can fix that by switching to memcpy.

    Next, the bit pattern 0xc5dac022 as an IEEE-754 single precision float corresponds to a value of about -7000, if my test is right. This is truncated to -7000, which, being negative, cannot be represented in an unsigned short. As such, attempting to convert it to unsigned short has undefined behavior, per [7.3.10 p1] in the C++ standard (C++20 N4860). Note this is different than the situation for trying to convert a signed or unsigned integer to unsigned short, which would have well-defined "wrapping" behavior.

    So there is no "correct answer" here. Printing 0 is a perfectly legal result, and is also logical in some sense, as 0 is the closest unsigned short value to -7000. But it's also not surprising that the result would vary between platforms / compilers / optimization options, as this is common for UB.


    There is actually a difference between ARM64 and x86-64 that explains why this is the particular behavior you see.

    When compiling without optimization, in both cases, gcc emits instructions to actually convert the float value to unsigned short at runtime.

    ARM64 has a dedicated instruction fcvtzu that converts a float to a 32-bit unsigned int, so gcc emits that instruction, and then extracts the low 16 bits of the integer result. The behavior of fcvtzu with a negative input is to output 0, and so that's the value that you get.

    x86-64 doesn't have such an instruction. The nearest thing is cvttss2si which converts a single-precision float to a signed 32-bit integer. So gcc emits that instruction, then uses the low 16 bits of it as the unsigned short value. This gives the right answer whenever the input float is in the range [0, 65536), because all these values fit in the range of a 32-bit signed integer. GCC doesn't care what it does in all other cases, because they are UB according to the C++ standard. But it so happens that, since your value -7000 does fit in signed int, then cvstss2si returns the signed integer -7000, which is 0xffffe4a8. Extracting the low 16 bits gives you the 0xe4a8 that you observed.

    When optimizing, gcc on both platforms optimizes the value into a constant 0. Which is also perfectly legal.