Search code examples
assemblyx86att

how to convert 64bit number to string in x86 assembly?


I am looking for a way to convert 64bit number to a string (and possibly the other way around) using 32bit system. I'm not asking for code, just asking for some ideas.


Solution

  • The only hard part is dividing a 64bit number by 10 on a 32bit machine. Everything else is pretty much the same as the normal case where numbers fit in a single register.

    Often you can look at gcc output for hints on how to do things in asm, but in this case it just calls the __udivdi3 libgcc helper function :/

    If you're just doing this as a learning exercise, then probably you should just look up an extended-precision div algorithm and use it. Here's one, from book, using Intel syntax and 16bit operations. The variable-names are clear, and there's explanatory text, so you should be able to re-implement it for 32bit. Google on that phrase for more hits, and / or look at the libgcc source code.

    See also implementing school-like division on 32bit chunks on x86


    If you're implementing this for real (for high performance):

    Remember that x86's div instruction does a 64b/32b -> 32b division (but faults if the quotient overflows a 32bit register). So you could check if the low bits of your high dword are small enough, and if so you only need a single division for the first step to get the high digit.

    As soon as your number is small enough to divide with a single div, break out of the extended-precision loop and use a single div per digit.

    That probably only takes one iteration to reduce down to a 32bit number. At that point you can divide by 10 using the multiplicative inverse:

    // from the godbolt link: gcc5.3 -O3 -m32
    uint32_t div10_u32(uint32_t x) { return x/10; }
        movl    $-858993459, %edx     # 0xcccccccd
        movl    %edx, %eax            # gcc is dumb: no need for this mov.  clang avoids it
        mull    4(%esp)
        movl    %edx, %eax
        shrl    $3, %eax
        ret
    

    Note how this uses the high half of the result of a full-multiply (32bx32b->64b).


    It might be faster to do the whole thing using multiplicative inverses, even though that means doing a 64 x 64b -> 128b multiply on a 32bit machine. Integer division is very slow, and barely pipelined, but integer mul is very fast on Intel CPUs.

    AVX512-DQ adds a 64x64 -> 64b low multiply instruction, but that doesn't for extended precision. AVX512-IFMA adds 52bx52b low and high multiply instructions, so in a few years it might be worth having a code-path for that (32bit binaries running on hardware with AVX512-IFMA), when the top 64-52 bits of your number is all-zero.