Search code examples
floating-pointieee-754

Grid step (difference between two adjacent machine-representable numbers


I wrote such a code to find the grid step(difference between two adjacent machine-representable numbers, machine epsilon the difference between the minimum machine-representable number greater than one and one). Why does this program show 1 for huge values ​​of x and how to fix it to show the correct answer?

#include <stdio.h>
int main(void)
{
    long double x,eps=1.0,a; 
    scanf("%Lg",&x);
    do
    {
        a=eps;
        eps/=2.0;
    }
    while( x+eps>x);
    printf("Grid step: %Le",(long double)a);
    return 0;
}


Solution

  • Floating-point numbers are represented as a sign (+1 or −1) s, an exponent e, and a numeral f with a fixed number of digits in a fixed base (called the significand). With base b and precision (number of digits) p, the number represented with sign s, exponent e, and digits f0, f−1, f−2, f−3, f−4,… f1−p is sbe•sum(fibi for −p < i ≤ 0).

    For example, with base 2, precision 4, sign +1, exponent 1, and significand 1.001, the number represented is +1•21•1.0012 = 2•1⅛ = 2¼ = 9/4 = 2.25.

    Given any representable number within bounds, the next representable number of greater magnitude is obtained by adding 1 to the lowest digit of the significand. With the above example, the next greater number would have significand 1.010 and the same sign and exponent, so it would be +1•21•1.0102 = 2•1¼ = 2½ = 10/4 = 2.5.

    This means that, if a number x is represented with exponent e, then the difference between it and the next greater magnitude representable number is beb1−p = be+1−p.

    The number 1 is represented with exponent 0, since 1 = +1•b0•1.000…000b. So the difference between 1 and the next representable number is b1−p. If your long double format has a base of two and a precision of 64 digits (bits), then this difference is 21−64 = 2−63, so the value of the so-called “machine epsilon” should be 2−63.

    If a binary format is being used, then 2 is represented with an exponent of 1, so the difference between it and the next representable number is be+1−p = 21+1−64 = 2−62. These differences are called ULP, units of least precision.

    The ULP for 4 is 2−61, for 8 is 2−60, and so on. For 263, the ULP is 263−63 = 1.

    When you enter a number that is 264 or larger, your program fails to find the ULP because the ULP is larger than 1, but your program starts looking at 1 and works downward.

    You can fix it either by starting with the largest possible ULP in your long double format or by changing the program to work either upward or downward, depending on the value of x.