Search code examples
c++floating-pointdenormal-numbers

Why comparing a small floating-point number with zero yields random result?


I am aware that floating-point numbers are tricky. But today I encountered a case that I cannot explain (and cannot reproduce using a standalone C++ code).

The code within a large project looks like this:

int i = 12;

// here goes several function calls passing the value i around, 
// but using different types (due to unfortunate legacy code)
... 

float f = *((float*)(&i)); // f=1.681558e-44

if (f == 0) {
    do something;
} else {
    do something else;
}

This piece of code causes a random behavior. Using gdb, it's identified that the random behavior is due to the comparison f == 0 which gives random results, i.e., sometimes true, sometimes false. The bug in the code was that, before using f, it should check whether or not the 4-bytes should be interpreted as integer (using other aux information). The fix is to first cast it back to integer, and then compare the integer with 0. Then problem solved.

Also in case a floating number comparison is needed (in such case, the floating number is not casted from integer as shown above), I also changed the comparison to abs(f) < std::numeric_limits<float>::epsilon(), to be on the safer side.

After that, I also wanted to reproduce it using a simple test program, but it seems I cannot reproduce it. (The compiler used for the project is different from what I am using for compiling the test program though). The following is the test program:

#include <stdio.h>

int main(void){
    int i = 12;
    float f = *(float*)(&i);

    for (int i = 0; i < 5; ++i) {
        printf("f=%e %s\n", f, (f == 0)? "=0": "!=0");
    }
    return 0;
}

I am wondering, what could be the reason for the random behavior of the comparison with zero?


Solution

  • Barring the undefined behavior which can be easily be fixed, you're seeing the effect of denormal numbers. They're extremely slow (see Why does changing 0.1f to 0 slow down performance by 10x?) so in modern FPUs there are usually denormals-are-zero (DAZ) and flush-to-zero (FTZ) flags to control the denormal behavior. When DAZ is set the denormals will compare equal to zero which is what you observed

    Currently you'll need platform-specific code to disable it. Here's how it's done in x86:

    #include <string.h>
    #include <stdio.h>
    #include <pmmintrin.h>
    
    int main(void){
        int i = 12;
        float f;
        memcpy(&f, &i, sizeof i);
    
        _MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON);
        _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
        printf("%e %s 0\n", f, (f == 0) ? "=": "!=");
    
        _MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_OFF);
        _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_OFF);
        printf("%e %s 0\n", f, (f == 0) ? "=": "!=");
    
        return 0;
    }
    

    Output:

    0.000000e+00 = 0
    1.681558e-44 != 0
    

    Demo on Godbolt

    See also: