Search code examples
c++c++11templatesstdarraycode-size

Will std::array template instances occupy more code memory?


I have a micro-controller that does not have an MMU, but we are using C and C++.

We are avoiding all dynamic memory usage (i.e. no new SomeClass() or malloc()) and most of the standard library.

Semi-Question 0:

From what I understand std::array does not use any dynamic memory so its usage should be OK (It is on the stack only). Looking at std::array source code, it looks fine since it creates a c-style array and then wraps functionality around that array.

The chip we are using has 1MB of flash memory for storing code.

Question 1:

I am worried that the use of templates in std::array will cause the binary to be larger, which will then potentially cause the binary to exceed the 1MB code memory limit.

I think if you create an instance of a std::array< int, 5 >, then all calls to functions on that std::array will occupy a certain amount of code memory, lets say X bytes of memory.

If you create another instance of std::array< SomeObject, 5 >, then call functions to that std::array, will each of those functions now be duplicated in the binary, thus taking up more code memory? X bytes of memory + Y bytes of memory.

If so, do you think the amount of code generated given the limited code memory capacity will be a concern?

Question 2:

In the above example, if you created a second std::array< int, 10 > instance, would the calls to functions also duplicate the function calls in the generated code? Even though both instances are of the same type, int?


Solution

  • std::array is considered a zero cost abstraction, which means it should be fairly optimizable by the compiler.

    As of any zero cost abstraction, it may induce a small compile time penality, and if the opimizations required te be truely zero cost are not supported, then it may incur a small size or runtime penality.

    However, note that compiler are free to add padding at the end of a struct. Since std::array is a struct, you should check how your platform is handling std::array, but I highly doubt it's the case for you.

    Take this array and std::array case:

    #include <numeric>
    #include <iterator>
    
    template<std::size_t n>
    int stuff(const int(&arr)[n]) {
        return std::accumulate(std::begin(arr), std::end(arr), 0);
    }
    
    int main() {
        int arr[] = {1, 2, 3, 4, 5, 6};
        return stuff(arr);
    }
    

    #include <numeric>
    #include <iterator>
    #include <array>
    
    template<std::size_t n>
    int stuff(const std::array<int, n>& arr) {
        return std::accumulate(std::begin(arr), std::end(arr), 0);
    }
    
    int main() {
        std::array arr = {1, 2, 3, 4, 5, 6};
        return stuff(arr);
    }
    

    Clang support this case very well. all cases with std::array or raw arrays are handleld the same way:

    -O2 / -O3 both array and std::array with clang:

    main: # @main
      mov eax, 21
      ret
    

    However, GCC seem to have a problem optimizing it, for bith the std::array and the raw array case:

    -O3 with GCC for array and std::array:

    main:
      movdqa xmm0, XMMWORD PTR .LC0[rip]
      movaps XMMWORD PTR [rsp-40], xmm0
      mov edx, DWORD PTR [rsp-32]
      mov eax, DWORD PTR [rsp-28]
      lea eax, [rdx+14+rax]
      ret
    .LC0:
      .long 1
      .long 2
      .long 3
      .long 4
    

    Then, it seem to optimize better with -O2 in the case of raw array and fail with std::array:

    -O2 GCC std::array:

    main:
      movabs rax, 8589934593
      lea rdx, [rsp-40]
      mov ecx, 1
      mov QWORD PTR [rsp-40], rax
      movabs rax, 17179869187
      mov QWORD PTR [rsp-32], rax
      movabs rax, 25769803781
      lea rsi, [rdx+24]
      mov QWORD PTR [rsp-24], rax
      xor eax, eax
      jmp .L3
    .L5:
      mov ecx, DWORD PTR [rdx]
    .L3:
      add rdx, 4
      add eax, ecx
      cmp rdx, rsi
      jne .L5
      rep ret 
    

    -O2 GCC raw array:

    main:
      mov eax, 21
      ret
    

    It seem that the GCC bug failling to optimize -O3 but succeed with -O2 is fixed in the most recent build.

    Here's a compiler explorer with all the O2 and the O3


    With all these cases stated, you can see a common pattern: No information about the std::array is outputted in the binary. There are no constructors, no operator[], not even iterators, nor algorithms. Everything is inlined. Compiler are good at inlining simple functions. std::array member functions are usually very very simple.

    If you create another instance of std::array< SomeObject, 5 >, then call functions to that std::array, will each of those functions now be duplicated in the binary, thus taking up more flash memory? X bytes of memory + Y bytes of memory.

    Well, you changed the data type your array is containing. If you manually add overload of all your functions to handle this additional case, then yes, all those new functions may take up some space. If your function are small, there is a great chance for them to be inlined and take less space. As you can see with the example above, inlining and constant folding may greatly reduce your binary size.

    In the above example, if you created a second std::array instance, would the calls to functions also duplicate the function calls in flash memory? Even though both instances are of the same type, int?

    Again it depends. If you have many function templated in the size of the array, both std::array and raw arrays may "create" different function. But again, if they are inlined, there is no duplicate to be worried about.

    Both will a raw array and std::array, you can pass a pointer to the start of the array and pass the size. If you find this more suitable for your case, then use that, but still raw array and std::array can do that. For raw array, it implicitly decays to a pointer, and with std::array, you must use arr.data() to get the pointer.