Search code examples
coptimizationembeddedc99stdint

In embedded MCU application is it better to use uint_fast16_t or size_t in for loops?


I would like to write portable code for applications that will run on different MCUs (16-bits, 32-bits or 64-bits base).

  • MSP-430
  • nRF52 (32-bits)
  • PIC (16-bits)
  • C51 (8-bits)

Let's consider this snippet:

events = 0;
for (size_t i = 0; i < sizeof(array) / sizeof(array[0]); i++) {
    if (array[i] > threshold) 
        events++;
}

My question concerns the type of the loop counter variable, here is size_t.

Usually size_t should be large enough to address all the memory of my system. So using size_t might impact the performance of my code on some architecture because the width of this variable is too large for the length of the array I have.

With this assumption I should better use uint_fast16_t because I know that my array is less than 65k elements.

Does it make sense to care about this writing or is my compiler smart enough to optimize it?

I think uint_fast16_t is rarely used and pretty much boilerplate in comparison with size_t.

To be more specific about my question:

Do I improve the portability of my code by systematically use the proper type for my loop counter (uint_fast8_t, uint_fast16_t, ...) or should I prefer using size_t because in most of the cases it will make no differences in terms of performance?

EDIT

Following your comments and remark it is clear that most of the time, the compiler will register the loop counter so choosing between size_t or uint_fast8_t does not matter much.

https://godbolt.org/g/pbPCrf

main: # @main
  mov rax, -80
  mov ecx, dword ptr [rip + threshold]
.LBB0_1: # =>This Inner Loop Header: Depth=1
  [....]
.LBB0_5: # in Loop: Header=BB0_1 Depth=1
  add rax, 8     # <----------- Kept in a register
  jne .LBB0_1
  jmp .LBB0_6
.LBB0_2: # in Loop: Header=BB0_1 Depth=1
  [....]
.LBB0_6:
  xor eax, eax
  ret

This question could become a real issue if the loop length become bigger than the internal CPU register e.g. doing a 512 loops on a 8-bit micro-controller.


Solution

  • For MCUs, use the smallest type that you know will fit the array size. If you know that the array is possibly larger than 256 bytes, but never larger than 65536, then uint_fast16_t is the most appropriate type indeed.

    Your main issue here will be 16 bit MCUs with extended flash (>64kb), giving 24 or 32 bit address width. This is the norm for many 16 bitters. On such systems, size_t will be 32 bits, and will therefore be a slow type to use.

    If portability to 8 or 16 bitters is no concern, then I would have used size_t.

    In my experience, many embedded compilers are not smart enough to optimize code to use a smaller type than the stated one, even though they can deduct the array size at compile-time.