c performance embedded portability uint64

Inconveniences of using uint64_t

I have a highly portable library (it compiles and works well everywhere, even without a kernel) and I would like that it remains as portable as possible. So far I have avoided 64bit data types, but I might need to use them now – to be precise I would need a 64bit bitmask.

I have never really thought about it and I am not enough an hardware expert (especially concerning embedded systems), but I am wondering now: what are the inconveniences of using uint64_t (or, equivalently, uint_least64_t)? I can think of two approaches to my question:

Actual portability: Are all microcontrollers – including 8bit CPU – able to deal with 64bit integers?
Performance: How slow will a 8bit CPU perform bitwise operations on a 64bit integer compared to a 32bit integer? The function I am designing will have only one 64bit variable, but will perform a lot of bitwise operations on it (i.e. in a loop).

Solution

There are various minimum requirements on a conforming C compiler. The C language allows two forms of compilers: hosted and freestanding. Hosted is meant to run on top of an OS, and freestanding runs without an OS. Most embedded systems compilers are freestanding implementations.

Freestanding compilers have some leeway, they do not need to support all of the standard libraries, but they need to support a minimum subset of them. This includes stdint.h (see C17 4/6). Which in turn requires the compiler to implement the following (C17 7.20.1.2/3):

The following types are required:

int_least8_t int_least16_t int_least32_t int_least64_t
uint_least8_t uint_least16_t uint_least32_t uint_least64_t

So a microcontroller compiler does not need to support uint64_t, but it must (oddly enough) support uint_least64_t. In practice it means that the compiler might as well add uint64_t support too, since it's the same thing in this case.

As for what a 8 bit MCU supports... it supports 8 bit arithmetic through the instruction set, in some special cases also a few 16 bit operations using index registers. But in general, it must rely on software libraries whenever a larger type than 8 bits is used.

So if you attempt 32 bit arithmetic on a 8 bitter, it will inline some compiler software libraries with the code and the result will be hundreds of assembler instructions, making such code very inefficient and memory-consuming. 64 bit will be even worse.

Same thing with floating point numbers on MCUs that lack a FPU, these too will generate horribly inefficient code through software floating point libraries.

To illustrate, take a look at this non-optimized code for some very simple 64 bit addition on an 8-bitter AVR (gcc): https://godbolt.org/z/ezbKjY
It actually supported uint64_t but the compiler spewed out an enormous amount of overhead code, some 100 instructions. And in the middle of it, a call to an internal compiler function call __adddi3 hidden in the executable.

If we enable optimizations, we get

add64:
        push r10
        push r11
        push r12
        push r13
        push r14
        push r15
        push r16
        push r17
        call __adddi3
        pop r17
        pop r16
        pop r15
        pop r14
        pop r13
        pop r12
        pop r11
        pop r10
        ret

We'll have to dig through the library source or single-step the assembly live to see how much code there is inside __adddi3. I would guess it is not a trivial function still.

So as you hopefully can tell, doing 64 bit arithmetic on an 8-bit CPU is a very bad idea.