Search code examples
cunderflow

Float underflow in C explanation


I am solving one of C Primer Plus exercises dealing with float underflow. The task is to simulate it. I did it this way:

#include<stdio.h>
#include<float.h>

int main(void)
{
    // print min value for a positive float retaining full precision
    printf("%s\n %.150f\n", "Minimum positive float value retaining full precision:",FLT_MIN);

    // print min value for a positive float retaining full precision divided by two
    printf("%s\n %.150f\n", "Minimum positive float value retaining full precision divided by two:",FLT_MIN/2.0);

    // print min value for a positive float retaining full precision divided by four
    printf("%s\n %.150f\n", "Minimum positive float value retaining full precision divided by four:",FLT_MIN/4.0);

    return 0;
}

The result is

Minimum positive float value retaining full precision:                 0.000000000000000000000000000000000000011754943508222875079687365372222456778186655567720875215087517062784172594547271728515625000000000000000000000000
Minimum positive float value retaining full precision divided by two:  0.000000000000000000000000000000000000005877471754111437539843682686111228389093327783860437607543758531392086297273635864257812500000000000000000000000
Minimum positive float value retaining full precision divided by four: 0.000000000000000000000000000000000000002938735877055718769921841343055614194546663891930218803771879265696043148636817932128906250000000000000000000000

I expected less precision for min float value divide by two and four but it seems the precision is ok and there is no underflow situation. How is it possible? Did I miss something?

Thank you very much


Solution

  • Incorrect method of assessing precision as code simple divides FLT_MIN (certainly a power of 2) by 2.

    Instead start with a number that is just above a power of 2 so its binary significand is something like 1.000...(maybe total of 24 binary digits)...0001. Insure values printed are originally float. (FLT_MIN/2.0 is a double.)

    Notice below that the precision is lost when the numbers becomes less than FLT_MIN: minimum normalized positive floating-point number.

    Also consider FLT_TRUE_MIN: minimum positive floating-point number. See binary32

    #include <float.h>
    #include <math.h>
    #include <stdio.h>
    
    int main(void) {
      char *format = "%.10e %a\n";
      printf(format, FLT_MIN, FLT_MIN);
      printf(format, FLT_TRUE_MIN, FLT_TRUE_MIN);
    
      float f = nextafterf(1.0f, 2.0f);
      do {
        f /= 2;
        printf(format, f, f);  // print in decimal and hex for detail
      } while (f);
      return 0;
    }
    

    Output

    1.1754943508e-38 0x1p-126
    1.4012984643e-45 0x1p-149
    
    5.0000005960e-01 0x1.000002p-1
    2.5000002980e-01 0x1.000002p-2
    1.2500001490e-01 0x1.000002p-3
    ...
    2.3509889819e-38 0x1.000002p-125
    1.1754944910e-38 0x1.000002p-126
    5.8774717541e-39 0x1p-127  // lost least significant bit of precision
    2.9387358771e-39 0x1p-128
    ...
    2.8025969286e-45 0x1p-148
    1.4012984643e-45 0x1p-149
    0.0000000000e+00 0x0p+0