Search code examples
calgorithmfloating-pointlibc

Is it possible to implement nextafter w/o obtaining a binary representation?


Usually nextafter is implemented in the following way:

double nextafter(double x, double y)
{
    // handle corner cases

    int delta = ((x > 0) == (x < y)) ? 1 : -1;
    unsigned long long mant = __mant(x);  // get mantissa as int
    mant += delta;
    ...
}

Here, a binary representation is obtained using __mant(x).

Out of curiosity: is it possible to implement nextafter without obtaining a binary representation? For example, using a sequence of arithmetic floating point operations.


Solution

  • The code below implements nextafter in the ascending direction for finite values for IEEE-754 arithmetic with round-to-nearest-ties-to-even. Handling of NaNs, infinities, and the descending direction is obvious.

    Without assuming IEEE-754 or round-to-nearest, the floating-point properties are sufficiently well characterized by C 2018 5.2.4.2.2 that we we can implement nextafter (again in the ascending direction) this way:

    • If the input is a NaN, return it, and report an error if it is a signaling NaN.
    • If the input is −∞, return -DBL_MAX.
    • If the input is -DBL_TRUE_MIN, return zero.
    • If the input is zero, return +DBL_TRUE_MIN.
    • If the input is +DBL_MAX, return +∞.
    • If the input is +∞, return +∞. (Note this never occurs with a full nextafter(x, y) implementation, as it moves the first argument in the direction of the second argument, so we never ascend from +∞ because we never receive a second argument greater than +∞.)
    • Otherwise, if it is positive, use logb to find the exponent e. If e is less than DBL_MIN, return the input plus DBL_TRUE_MIN (the ULP of subnormals and the lowest normals). If e is not less than DBL_MIN, return the input plus scalb(1, e + 1 - DBL_MANT_DIG) (the specific ULP for the input). Rounding method is irrelevant as these additions are exact.
    • Otherwise, the input is negative. Use the above except if the input is exactly a power of FLT_RADIX (the input equals scalb(1, e)), decrement the second argument of scalb by one (because this nextafter step transitions from a greater exponent to a lower one).

    Note that FLT_RADIX is correct above; there is no DBL_RADIX; all floating-point formats use the same radix.

    If you want to consider logb and scalb as functions that manipulate the floating-point representation, then they could be replaced by ordinary arithmetic. log could find a quick approximation that could be quickly refined to the true exponent, and scalb can be implemented in a variety of ways, possibly simply a table look-up. If log remains objectionable, then trial comparisons would suffice.

    The above handles formats with or without subnormals because, if subnormals are supported, it steps into them with the decrement, and, if subnormals are not supported, the minimum normal magnitude is DBL_TRUE_MIN, so it is recognized in the above as the point where we step to zero next.

    There is one caveat; the C standard allows it to be “indeterminable” whether an implementation supports subnormals or not “if floating-point operations do not consistently interpret subnormal representations as zero, nor as nonzero.” In that case, I do not see that the standard specifies what the standard nextafter does, so there is nothing for us to do to match it in our implementation. Supposing that subnormals are sometimes supported, DBL_TRUE_MIN must be a subnormal value, and the above will attempt to work as if subnormal support is currently on (e.g., flush-to-zero is off) and, if it is off, you will get whatever you get.

    #include <float.h>
    #include <math.h>
    
    
    /*  Return the next floating-point value after the finite value q.
    
        This was inspired by Algorithm 3.5 in Siegfried M. Rump, Takeshi Ogita, and
        Shin'ichi Oishi, "Accurate Floating-Point Summation", _Technical Report
        05.12_, Faculty for Information and Communication Sciences, Hamburg
        University of Technology, November 13, 2005.
        
        IEEE-754 and the default rounding mode,
        round-to-nearest-ties-to-even, may be required.
    */
    double NextAfter(double q)
    {
        /*  Scale is .625 ULP, so multiplying it by any significand in [1, 2)
            yields something in [.625 ULP, 1.25 ULP].
        */
        static const double Scale = 0.625 * DBL_EPSILON;
    
        /*  Either of the following may be used, according to preference and
            performance characteristics.  In either case, use a fused multiply-add
            (fma) to add to q a number that is in [.625 ULP, 1.25 ULP].  When this
            is rounded to the floating-point format, it must produce the next
            number after q.
        */
    #if 0
        // SmallestPositive is the smallest positive floating-point number.
        static const double SmallestPositive = DBL_EPSILON * DBL_MIN;
    
        if (fabs(q) < 2*DBL_MIN)
            return q + SmallestPositive;
    
        return fma(fabs(q), Scale, q);
    #else
        return fma(fmax(fabs(q), DBL_MIN), Scale, q);
    #endif
    }
    
    
    #if defined CompileMain
    
    
    #include <stdio.h>
    #include <stdlib.h>
    
    
    #define NumberOf(a) (sizeof (a) / sizeof *(a))
    
    
    int main(void)
    {
        int status = EXIT_SUCCESS;
    
        static const struct { double in, out; } cases[] =
        {
            {  INFINITY,                 INFINITY                },
            {  0x1.fffffffffffffp1023,   INFINITY                },
            {  0x1.ffffffffffffep1023,   0x1.fffffffffffffp1023  },
            {  0x1.ffffffffffffdp1023,   0x1.ffffffffffffep1023  },
            {  0x1.ffffffffffffcp1023,   0x1.ffffffffffffdp1023  },
            {  0x1.0000000000003p1023,   0x1.0000000000004p1023  },
            {  0x1.0000000000002p1023,   0x1.0000000000003p1023  },
            {  0x1.0000000000001p1023,   0x1.0000000000002p1023  },
            {  0x1.0000000000000p1023,   0x1.0000000000001p1023  },
    
            {  0x1.fffffffffffffp1022,   0x1.0000000000000p1023  },
    
            {  0x1.fffffffffffffp1,      0x1.0000000000000p2     },
            {  0x1.ffffffffffffep1,      0x1.fffffffffffffp1     },
            {  0x1.ffffffffffffdp1,      0x1.ffffffffffffep1     },
            {  0x1.ffffffffffffcp1,      0x1.ffffffffffffdp1     },
            {  0x1.0000000000003p1,      0x1.0000000000004p1     },
            {  0x1.0000000000002p1,      0x1.0000000000003p1     },
            {  0x1.0000000000001p1,      0x1.0000000000002p1     },
            {  0x1.0000000000000p1,      0x1.0000000000001p1     },
    
            {  0x1.fffffffffffffp-1022,  0x1.0000000000000p-1021 },
            {  0x1.ffffffffffffep-1022,  0x1.fffffffffffffp-1022 },
            {  0x1.ffffffffffffdp-1022,  0x1.ffffffffffffep-1022 },
            {  0x1.ffffffffffffcp-1022,  0x1.ffffffffffffdp-1022 },
            {  0x1.0000000000003p-1022,  0x1.0000000000004p-1022 },
            {  0x1.0000000000002p-1022,  0x1.0000000000003p-1022 },
            {  0x1.0000000000001p-1022,  0x1.0000000000002p-1022 },
            {  0x1.0000000000000p-1022,  0x1.0000000000001p-1022 },
    
            {  0x0.fffffffffffffp-1022,  0x1.0000000000000p-1022 },
            {  0x0.ffffffffffffep-1022,  0x0.fffffffffffffp-1022 },
            {  0x0.ffffffffffffdp-1022,  0x0.ffffffffffffep-1022 },
            {  0x0.ffffffffffffcp-1022,  0x0.ffffffffffffdp-1022 },
            {  0x0.0000000000003p-1022,  0x0.0000000000004p-1022 },
            {  0x0.0000000000002p-1022,  0x0.0000000000003p-1022 },
            {  0x0.0000000000001p-1022,  0x0.0000000000002p-1022 },
            {  0x0.0000000000000p-1022,  0x0.0000000000001p-1022 },
    
            { -0x1.fffffffffffffp1023,  -0x1.ffffffffffffep1023  },
            { -0x1.ffffffffffffep1023,  -0x1.ffffffffffffdp1023  },
            { -0x1.ffffffffffffdp1023,  -0x1.ffffffffffffcp1023  },
            { -0x1.0000000000004p1023,  -0x1.0000000000003p1023  },
            { -0x1.0000000000003p1023,  -0x1.0000000000002p1023  },
            { -0x1.0000000000002p1023,  -0x1.0000000000001p1023  },
            { -0x1.0000000000001p1023,  -0x1.0000000000000p1023  },
    
            { -0x1.0000000000000p1023,  -0x1.fffffffffffffp1022  },
    
            { -0x1.0000000000000p2,     -0x1.fffffffffffffp1     },
            { -0x1.fffffffffffffp1,     -0x1.ffffffffffffep1     },
            { -0x1.ffffffffffffep1,     -0x1.ffffffffffffdp1     },
            { -0x1.ffffffffffffdp1,     -0x1.ffffffffffffcp1     },
            { -0x1.0000000000004p1,     -0x1.0000000000003p1     },
            { -0x1.0000000000003p1,     -0x1.0000000000002p1     },
            { -0x1.0000000000002p1,     -0x1.0000000000001p1     },
            { -0x1.0000000000001p1,     -0x1.0000000000000p1     },
    
            { -0x1.0000000000000p-1021, -0x1.fffffffffffffp-1022 },
            { -0x1.fffffffffffffp-1022, -0x1.ffffffffffffep-1022 },
            { -0x1.ffffffffffffep-1022, -0x1.ffffffffffffdp-1022 },
            { -0x1.ffffffffffffdp-1022, -0x1.ffffffffffffcp-1022 },
            { -0x1.0000000000004p-1022, -0x1.0000000000003p-1022 },
            { -0x1.0000000000003p-1022, -0x1.0000000000002p-1022 },
            { -0x1.0000000000002p-1022, -0x1.0000000000001p-1022 },
            { -0x1.0000000000001p-1022, -0x1.0000000000000p-1022 },
    
            { -0x1.0000000000000p-1022, -0x0.fffffffffffffp-1022 },
            { -0x0.fffffffffffffp-1022, -0x0.ffffffffffffep-1022 },
            { -0x0.ffffffffffffep-1022, -0x0.ffffffffffffdp-1022 },
            { -0x0.ffffffffffffdp-1022, -0x0.ffffffffffffcp-1022 },
            { -0x0.0000000000004p-1022, -0x0.0000000000003p-1022 },
            { -0x0.0000000000003p-1022, -0x0.0000000000002p-1022 },
            { -0x0.0000000000002p-1022, -0x0.0000000000001p-1022 },
            { -0x0.0000000000001p-1022, -0x0.0000000000000p-1022 },
        };
    
        for (int i = 0; i < NumberOf(cases); ++i)
        {
            double in = cases[i].in, expected = cases[i].out;
            double observed = NextAfter(in);
            printf("NextAfter(%a) = %a.\n", in, observed);
            if (! (observed == expected))
            {
                printf("\tError, expected %a.\n", expected);
                status = EXIT_FAILURE;
            }
        }
    
        return status;
    }
    
    
    #endif  // defined CompileMain