Search code examples
cfloating-pointlong-double

Get the digits of a long double


I am trying to implement a simple version of a function similar to dtoa but without arbitrary precision. I name it ftob which also handles arithmetic bases other than 10 (2-36). It works with long double, which on my machine is : x86 extended pricision

The function works fine, but on certain values it gives erroneous and unacceptable results, e.g. 2.5600 ,

here is my code:

#include<stdio.h>
#include<math.h>

char *ftob(long double ld, char *str, unsigned short int n_digits, unsigned short int base)
{
    long double ftob_tmp;
    short unsigned index = 1;
    short const sign = (ld < 0.0L) ? -1 : 1;
    short int i = 0, j = 0 , k = 0;

    //check base, number.
    if(base < 2 || base > 36)
        return NULL;
    else if(ld == INFINITY) {
        str = "inf";
        return str;
    }
    else if(ld == -INFINITY) {
        str = "-inf";
        return str;
    }
    else if(__isnanl(ld)) {
        str = "nan";
        return str;
    }
    //initialisations
    (sign == -1) ? str[i++] = '-' : 0;
    ftob_tmp = sign * ld;
    while(ftob_tmp > 0.0L && ftob_tmp < 1.0L) {
        ftob_tmp *= base;
        j++;
    }
    while(ftob_tmp >= base) {
        ftob_tmp /= base;
        j--;
    }
    //reinitialise
    ftob_tmp = sign * ld;
    if(ftob_tmp >= 0.0L && ftob_tmp < 1.0L) {
        str[i++] = '0';
        str[i++] = '.';
        for(k = 0; k < j - 1 && k < n_digits - 1; k++)
            str[i++] = '0';
        n_digits -= j;
    }
    else if(ftob_tmp >= base)
        k = i - j + 1;
    else
        k = i + 1;
    ftob_tmp *= powl(base, --j);
//  printf("%0.20Lf\n", ftob_tmp); /*debug message*/

    //main loop
    for(n_digits += i; i < n_digits; i++) {
        if(i == k)
            str[n_digits++, i] = '.';
        else {
//          printf("%0.20Lf * %Lf = %0.20Lf\n", ftob_tmp, powl(base, index), ftob_tmp * powl(base, index)); /* debug message*/
            str[i] = (int)fmodl((ftob_tmp * powl(base, index++)), base);
            str[i] += (str[i] < 10) ? '0' : 'A' - 10;
        }
    }

    //finalise
    str[i] = '\0';

    return str;
}

int main(void)
{
    char ftl[300];

    printf("ftl = \"%s\"\n", ftob(2.56L, ftl, 19, 10));
    return 0;
}

the output for ftob(2.56L, ftl, 19, 10) is:

ftl = "2.550990990990990990"

Uncommenting the debug message gives:

0.25599999999999999999
0.25599999999999999999 * 10.000000 = 2.55999999999999999995
0.25599999999999999999 * 100.000000 = 25.59999999999999999861
0.25599999999999999999 * 1000.000000 = 255.99999999999999998612
0.25599999999999999999 * 10000.000000 = 2560.00000000000000000000
0.25599999999999999999 * 100000.000000 = 25599.99999999999999822364
0.25599999999999999999 * 1000000.000000 = 255999.99999999999998578915
0.25599999999999999999 * 10000000.000000 = 2560000.00000000000000000000
0.25599999999999999999 * 100000000.000000 = 25599999.99999999999818101060
0.25599999999999999999 * 1000000000.000000 = 255999999.99999999998544808477
0.25599999999999999999 * 10000000000.000000 = 2560000000.00000000000000000000
0.25599999999999999999 * 100000000000.000000 = 25599999999.99999999813735485077
0.25599999999999999999 * 1000000000000.000000 = 255999999999.99999998509883880615
0.25599999999999999999 * 10000000000000.000000 = 2560000000000.00000000000000000000
0.25599999999999999999 * 100000000000000.000000 = 25599999999999.99999809265136718750
0.25599999999999999999 * 1000000000000000.000000 = 255999999999999.99998474121093750000
0.25599999999999999999 * 10000000000000000.000000 = 2560000000000000.00000000000000000000
0.25599999999999999999 * 100000000000000000.000000 = 25599999999999999.99804687500000000000
0.25599999999999999999 * 1000000000000000000.000000 = 255999999999999999.98437500000000000000
0.25599999999999999999 * 10000000000000000000.000000 = 2560000000000000000.00000000000000000000
ftl = "2.550990990990990990"

The source of the error seems to be that 0.256 cannot be represented exactly in long double and has a value about 0.255999999999999999989374818710.
But i am okay if I get output:

flt = "2.5599999999999999999"

instead of :

flt = "2.5600000000000000000"

The problem is that in the "main loop" in the fourth round it is arbitrarily rounded to 2560.00000 which causes str[i] to be set to 0 instead of 9. This is also because 2559.99999999999999... cannot be represented in long double.
But I only need '2559' to be representable so str[i] can be set to 9. (and so for every round in the loop).

I request advice on how I can achieve this, or if it is achievable at all.

Thanks in Advance,


Solution

  • Rounding error amplified by mod

    ftob_tmp * powl(...) product likely needs to round to nearest long double and so is not the exact math result. This rounded product is then modded and sometimes returns 0 or 9 as it is on or just under an integer value for the later digits of 0.255999999999999999999.

    //                  v- rounding introduced error -v
    str[i] = (int)fmodl((ftob_tmp * powl(base, index++)), base);
    //            ^-- error magnified -----------------^
    

    With more debug info, one can see the sometimes 0, sometimes 9 when only 9 was expected.

    printf("bbb %0.20Lf * %Lf = %0.20Lf  %d\n", 
        ftob_tmp, powl(base, index), ftob_tmp * powl(base, index), 
        (int) fmodl((ftob_tmp * powl(base, index++)), base));
    
    bbb 0.25599999999999999999 * 100.000000 = 25.59999999999999999861  2
    bbb 0.25599999999999999999 * 10000.000000 = 2560.00000000000000000000  5
    bbb 0.25599999999999999999 * 1000000.000000 = 255999.99999999999998578915  9
    bbb 0.25599999999999999999 * 100000000.000000 = 25599999.99999999999818101060  0
    bbb 0.25599999999999999999 * 10000000000.000000 = 2560000000.00000000000000000000  9
    bbb 0.25599999999999999999 * 1000000000000.000000 = 255999999999.99999998509883880615  9
    bbb 0.25599999999999999999 * 100000000000000.000000 = 25599999999999.99999809265136718750  0
    bbb 0.25599999999999999999 * 10000000000000000.000000 = 2560000000000000.00000000000000000000  9
    ...
    

    how I can achieve this, or if it is achievable at all (?)

    Yes, achievable but not with OP's approach as too much error is injected in various steps. These corner cases are quite difficult and usual oblige an wide or extended integer computation instead of floating point.

    Example code to print a double in base 10 exactly may help.


    Other lesser issues

    More rounding error

    The loops with ftob_tmp *= base and ftob_tmp /= base each inject up to a 0.5 ULP error. These loops can then form an off-by-one j calculation.

    -0.0

    Test the sign, not the value, else -0.0 will print as 0.0.

    // sign = (ld < 0.0L) ? -1 : 1;
    sign = signbit(ld) ? -1 : 1;
    

    String size

    char ftl[300]; is insufficient for LDBL_MAX in base 2. Look to LDBL_MAX_EXP, LDBL_MIN_EXP to help determine minimum max string size.